Exporting and Importing Data

For importing, there are currently two different tool-sets available in CATMAID. A front-end in Django’s admin interface is only available for importing project and stack information. If you want to import tracing data, you have to resort to the command line.

Exporting and importing neuron tracing data

Two management commands for Django’s manage.py tool are available in CATMAID that allow exporting and importing neuron tracing data. They are called catmaid_export_data and catmaid_import_data. To use them, you have to be in the virtualenv and it is probably easiest to work from the django/projects/ directory.

Exporting data

At the moment, the export command is able to create a JSON representation of neurons, connectors, tags and annotations. To constrain the exported neurons, annotations can be used. To export data, you have to use the catmaid_export_data command:

manage.py catmaid_export_data

Adding the --help option will show an overview over all available options. When called without any option, the command will ask the user for the project to export from and will start exporting the whole project right away. Use the additional options to be more precise about what should be exported.

Without any parameter, everything is exported. The type of data to be exported can be adjusted by the --notreenodes, --noconnectors, --noannotations and --notags parameters. To constrain the exported neurons, the --required-annotation option can be used. For instance, to export all neurons from the project with ID 1 that are annotated with “Kenyon cells”, one would have to call:

manage.py catmaid_export_data --source 1 --required-annotation "Kenyon cells"

This will create a file called export_pid_<pid>.json, which would be export_pid_1.json in our case. A different file name can be specified using the --file option and if the passed in string contains “{}”, the braces will be replaced by the source project ID.

Users are represented by their usernames and it is not required to export user model objects as well. The importer can either map to existing users or create new ones. If wanted, though, complete user models can be exported (and imported) as well by providing the --users option. Be aware though that this includes the hashed user passwords.

Importing data

The JSON file generated in the previous section can be used to import data into a CATMAID project. This project can be non-empty or a new one and can be part of the source CATMAID instance or a completely different one. do this so, use the catmaid_import_data management command:

manage.py catmaid_import_data

You can use the --help switch to get an overview of the available options. Like the exporter, the importer will ask a user if it needs more information. If no project is specified, users can select an existing one or create a new empty target project interactively.

Assuming a file called export_pid_1.json is available and a new CATMAID project with ID 1 has been created, the following command will start the import:

manage.py catmaid_import_data --source export_pid_1.json --target 1

By default, the importer tries to map users referenced in the input data to existing users. If this is not wanted, the option --map-users false has to be used.

The importer looks at the user_id, editor_id and reviewer_id fields of imported objects, if available. CATMAID needs to know what to do with this information. Besides mapping users, CATMAID can also override all import data user information with a single user. Setting --map-users false and either selecting a user interactively oder with the help of the --user <id> option, will accomplish this.

If users are mapped and a username does not exist, the import is aborted—unless the --create-unknown-users option is provided. This will create new inactive user accounts with the provided usernames that are referenced from the new imported objects. The allows to not require full user profiles to be part of exported data.

By default, the importer won’t use the IDs of spatial data provided from the import source and instead will create new database entries. This default is generally useful and doesn’t risk replacing existing data. All relations between objects are kept. If however needed, the use of the source provided IDs can be enforced by using the --preserve-ids option.

Semantic data, like annotations and tags however are re-used if already available in the target project. The only exception to this are neuron and skeleton objects, which are technically semantic objects, but are expected to not be shared or reused.

Bulk loading large data sets

Importing data using the catmaid_import_data management command works well for thousands of neurons and connectors. It becomes however very slow and memory intensive to load millions or billions of neurons. On this scale, loading data direcrly into the database is the best strategy. It requires extra care, because most safe-guards the API and management commands provides will be bypassed.

Details on the import process are collected on the Bulk loading page.

Importing project and stack information

Image data in CATMAID is referenced by stack mirrors, which belong to a particular stack. Stacks in turn are organized in projects. The data used by a stack mirror can have one of various types of data sources. A simple and often used source is a simple folder structure of tiled image data for each stack. To be accessible, a stack mirror’s image base has to give access to such a folder from the web. Of course, stacks, stack mirrors and projects can be created by hand, but there is also an importing tool available in Django’s admin interface. It can be found under Custom Views and is named Import projects and image stacks.

The admin import tool can import project and stack information from various source:

  • Other CATMAID instances,

  • A general remote API endpoint

  • A remote or local JSON file

  • A textural JSON definition

  • A local file and folder structure

All JSON representaitons that can be imported are of the format that is returned by CATMAID’s /projects/export API endpoint. This very same format can be used with a managment command, catmaid_import_projects, on the command line.

The command line option and the file based import are explained in more detail below. The latter expects a certain data folder layout to work on and als relies on so called project files (a file in a simple YAML format) to identify potential projects. Its data layout is explained below as well.

How to use the importing tool will be shown in the last section.

Note

Regardless of whether you import image stacks with the help of the importer or add projects, stacks, stack mirrors, project-stack links and permissions by hand: you need to make the image data accessible through your webserver. For instance, say you are running CATMAID on the URL example.com/catmaid and you are using the importer and have IMPORTER_DEFAULT_IMAGE_BASE set to https://example.com/catmaid/data and CATMAID_IMPORT_PATH to /home/catmaid/data/. The webserver would now need to map /catmaid/data to /home/catmaid/data/. For Nginx this would look like that:

location /catmaid/data/ {
  alias /home/catmaid/data/;
}

Importing projects and stacks on the command line

The management command catmaid_import_projects can be used to import project from the command line. It understands the basic /projects/export API endpoint format. A simple example might look like this:

[{
  "project": {
    "title": "L1 CNS",
    "stacks": [{
      "title": "L1 CNS",
      "dimension": "(28128, 31840, 4841)",
      "mirrors": [{
        "fileextension": "jpg",
        "position": 3,
        "tile_source_type": 4,
        "tile_height": 512,
        "tile_width": 512,
        "title": "Example tiles",
        "url": "https://example.com/ssd-tiles/"
      }],
      "resolution": "(3.8,3.8,50)",
      "translation": "(0,0,6050)"
    }]
  }
}]

This can be provided as a file with its path provided as --input argument to catmaid_import_projects or through the stdin stream that is piped into it. Additionally, it possible to provide permissions for the imported projects. This is done through one ore more --permission parameters. Each of them will take an argument in the form type:name:permission, where type can either be user or group. name is either a username or a groupname and permission can be can_browse, can_annotate and so forth.

The import can be configured with additional options which are explained when using the --help option.

Project Files

If the importing tool encounters a folder with a file called project.yaml in it, it will look at it as a potential project. If this file is not available, the folder is ignored. However, if the file is there it gets parsed and if all information is found the tool is looking for, the project can be imported. So let’s assume we have a project with two stacks having one image data copy each in folder with the following layout:

project1/
  project.yaml
  stack1/
  stack2/

A project file contains the basic properties of a project and its associated stacks. It is a simple YAML file and could look like this for the example above:

project:
    title: "Wing Disc 1"
    stacks:
      - title: "Channel 1"
        description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
        dimension: "(3886,3893,55)"
        resolution: "(138.0,138.0,1.0)"
        mirrors:
          - title: "Channel 2 overlay"
            folder: "stack1"
            fileextension: "jpg"
      - title: "Channel 2"
        description: "PMT Offset: 10, Laser Power: 0.7, PMT Voltage: 500"
        dimension: "(3886,3893,55)"
        resolution: "(138.0,138.0,1.0)"
        downsample_factors: ["(1,1,1)", "(2,2,1)", "(4,4,1)"]
        mirrors:
          - title: Channel 2 image data
            folder: "stack2"
            fileextension: "jpg"
        stackgroups:
          - title: "Example group"
            relation: "has_channel"
      - title: "Remote stack"
        dimension: "(3886,3893,55)"
        resolution: "(138.0,138.0,1.0)"
        zoomlevels: 3
        downsample_factors: ["(1,1,1)", "(2,2,1)", "(4,4,1)", "(8,8,1)", "(16,16,1)"]
        translation: "(10.0, 20.0, 30.0)"
        mirrors:
          - tile_width: 512
            tile_height: 512
            tile_source_type: 2
            fileextension: "png"
            url: "https://my.other.server.net/examplestack/"
        stackgroups:
          - title: "Example group"
            relation: "has_channel"

As can be seen, a project has only two properties: a name and a set of stacks. A stack, however, needs more information. In general, there are two ways to specify the data source for a folder: 1. an optional path and a folder, both together are expected to be relative to the IMPORTER_DEFAULT_IMAGE_BASE settings or 2. a url, which is used as a stack mirror’s image base.

The first stack in the example above is based on a folder in the same directory as the project file. The folder property names this image data folder for this stack, relative to the project file. The name of stack is stored in the title field and metadata (which is shown when a stack is displayed) can be added with the metadata property. A stack also needs dimensions and resolution information. Dimensions are the stacks X, Y and Z extent in pixel. The resolution should be in in nanometers per pixel, in X, Y and Z.

In addition to the folder information, the second and third stack above use the downsample_factors field to declare the number of available “zoom levels”. This is done by providing a list of 3D scale factors, one for each zoom level available, with 1,1,1 being the unscaled voxel size of the data set. In a classical anisotropic TEM data set, it is common to zoom out in steps of factors of two in X and Y and keep the visual section thickness the same. Assuming there are four additional zoom levels besides the original version, this could be expressed as the downsample factors ["(1,1,1)", "(2,2,1)", "(4,4,1)", "(8,8,1)", "(16,16,1)"], scaling X and Y by a factor of two and Z not at all for each entry of the array.

Of course, any other configuration could be chosen as well. Likewise for isotropic data sets like FIB-SEM, the Z component in each tuple would be scaled as well. If no downsample_factors are given, CATMAID’s default behavior is to use as many scale levels as are needed in the above anisotropic downsample factors for the original image width or height to fit on a single tile. Since different stack mirrors can have different tile sizes, the minimum of those are used. Without any mirrors, no zoom levels will be assumed.

The example above also specifies the file extension of the image files with the fileextension key. Both fields are required.

The last stack in the example above doesn’t use a local stack folder, but declares the stack mirror’s image base explicitly by using the url setting. Like done for the folder based stacks, a url based stack mirror needs the tile_width, tile_height and tile_source_type fields. The corresponding stack defines the resolution and dimension fields.

CATMAID can link stacks to so called stack groups. These are general data structures that relate stacks to each other, for instance to denote that they represent channels of the same data, orthogonal views or simple overlays. There is no limit on how many stack groups a stack can be part of. Each stack in a project file can reference stack groups by title and the type of relation this stack has to this stack group. At the moment, valid relations are channel and view. All stacks referencing a stack group with the same name will be linked to the same new stack group in the new project. In the example above, a single stack group named “Example group” will be created, having stack 2 and 3 as members—each representing a layer/channel. Stack groups are used by the front-end to open multiple stacks at once in a more intelligent fashion (e.g. open multi-channel stack groups as layers in the same viewer).

All specified stacks within a project are linked into a single space. By default each stack origin is mapped to the project space origin (0,0,0). An optional translation can be applied to this mapping: If a stack has a translation field, the stack is mapped with this offset into project space. Note that this translation is in project space coordinates (physical space, nanometers). The example above will link the last stack (“Remote stack”) to the project “Wing Disc 1” with an offset of (10.0, 20.0, 30.0) nanometers. Both other stacks will be mapped to the project space origin.

Also, it wouldn’t confuse the tool if there is more YAML data in the project file than needed. It only uses what is depicted in the sample above. But please keep in mind to not use the tab character in the whitespace indentation (but simple spaces) as this isn’t allowed in YAML.

Ontology and classification import

The project files explained in the last section can also be used to import ontologies and classifications. While CATMAID supports arbitrary graphs to represent ontologies and classifications,only tree structures can be imported at the moment.

The project object supports an optional ontology field, which defines an ontology hierarchy with lists of lists. An optional classification field can be used to define a list of ontology paths that get instantiated based on the provided ontology. Classification fields require that an ontology is defined and can be used on project level, stack level and the stackgroup level. Consider this example:

project:
   title: "test"
   ontology:
     - class: 'Metazoa'
       children:
         - relation: 'has_a'
           class: 'Deuterostomia'
         - relation: 'has_a'
           class: 'Protostomia'
           children:
             - relation: 'has_a'
               class: 'Lophotrochozoa'
               children:
                 - relation: 'has_a'
                   class: 'Nematostella'
                   children:
                     - relation: 'has_a'
                       class: 'Lineus longissimus'
   stackgroups:
     - title: 'Test group'
       classification:
          - ['Metazoa', 'Protostomia', 'Lophotrochozoa', 'Nematostella', 'Lineus longissimus']
   stacks:
     - title: "Channel 1"
       description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
       dimension: "(1024,1024,800)"
       resolution: "(2.0,2.0,1.0)"
       zoomlevels: 1
       translation: "(10.0, 20.0, 30.0)"
       classification:
          - ['Metazoa', 'Deuterostomia']
       mirrors:
          - title:  Channel 1
            url: "https://example.org/data/imagestack/"
            fileextension: "jpg"
     - title: "Channel 1"
       description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
       dimension: "(1024,1024,800)"
       resolution: "(2.0,2.0,1.0)"
       zoomlevels: 1
       translation: "(10.0, 20.0, 30.0)"
       mirrors:
         - title: Channel 1
           url: "https://example.org/data/imagestack-sample-108/"
           fileextension: "jpg"
       stackgroups:
        - title: "Test group"
          relation: "has_channel"
     - title: "Channel 2"
       description: "PMT Offset: 10, Laser Power: 0.5, PMT Voltage: 550"
       dimension: "(1024,1024,800)"
       resolution: "(2.0,2.0,1.0)"
       zoomlevels: 1
       mirrors:
        - title: Channel 2
          folder: "Sample108_FIB_catmaid copy"
          fileextension: "jpg"
       stackgroups:
        - title: "Test group"
          relation: "has_channel"

The project level ontology definition represent an ontology with the root node “Metazoa” which has two children: “Deuterostomia” and “Protostomia”, connected through a “has_a” relation. While the first child is a leaf node and has no children, the second child has a child node as well (and so on). It is possible to have multiple roots (i.e. separate ontology graphs) and multiple children, both are lists.

Individual stacks and stackgroups are then allowed to instantiate a certain path of the ontology and be linked to the leaf node of the path. They do this by supporting a classification field. The example creates two classification paths and links one leaf node to the stack group and one to an individual stack.

Currently, the importer expects that those two classes are only related on the ontology level a single time. This allows for an easier file syntax with a simple list. An import will fail if the project defined ontology doesn’t contain a class used in a classification.

File and Folder Layout

The importing tool expects a certain file any folder layout to work with. It assumes that there is one data folder per CATMAID instance that is accessible from the outside world and is somehow referred to within a stack mirror’s image base (if referring to folders in the project file). As an example, let’s say a link named data has been placed in CATMAID’s top-level directory. This link links to your actual data storage and has a layout like the following:

data/
  project1/
  project2/
  project3/
  tests/
    project4/

Each project folder has contents similar to the example in the previous section. Assuming your webserver has been configured to make this folder accessible, it should be possible to have the browser access it under:

https://<CATMAID-URL>/data

A typical URL to a tile of a stack could then look like this (if you use jpeg as the file extension):

https://<CATMAID-URL>/data/project1/stack1/0/0_0_0.jpeg

The importer uses this data directory or a folder below it as working directory. In this folder it treats every sub-directory as a potential project directory and tests if it contains a project file named project.yaml. If this file is found a folder remains potential project. A folder is ignored, though, when the project file is not available.

Also note, that such a data folder needs to be made accessible by the webserver. Otherwise, the CATMAID front-end won’t be able to retrieve and show image data. The “Note” box at the beginning of this section has more details on this.

Importing skeletons through the API

The CATMAID API supports raw skeleton data import using SWC files. As can be seen under /apis, the {project_id}/skeletons/import URL can be used to import skeletons that are repesented as SWC. The script scripts/remote/upload_swc.py can be of help here. It is also possible to just use cURL for this:

curl --basic -u fly -X POST --form file=@<file-name> \
    <catmaid_url>/<project_id>/skeletons/import \
    --header "X-Authorization: Token <api-token>"

Using the importer admin tool

The import offers to import from local project files, remote CATMAID instances or remote project files/exports.

To use the importer with project files, you have to adjust your CATMAID settings file to make your data path known to CATMAID. This can be done with the CATMAID_IMPORT_PATH settings. Sticking to the examples from before, this setting might be:

CATMAID_IMPORT_PATH = <CATMAID-PATH>/data

For imported stack mirrors that don’t provide an image URL by themselves, CATMAID can construct an image base from the the IMPORTER_DEFAULT_IMAGE_BASE setting plus the imported project and stack names. For the example above, this variable could be set to:

IMPORTER_DEFAULT_IMAGE_BASE = https://<CATMAID-URL>/data

With this in place, the importer can be used through Django’s admin interface. It is listed as Image data importer under Custom Views. The first step is to give the importer more detail about which folders to look in for potential projects:

_images/path_setup.png

With these settings, you can narrow down the set of folders looked at. The relative path setting can be used to specify a sub-directory below the import path. When doing so, the working directory will be changed to CATMAID_IMPORT_PATH plus the relative path. If left empty, just the CATMAID_IMPORT_PATH setting will be used. Additionally, you can filter folders in tho working directory by specifying a filter term, which supports Unix shell-style wildcards. The next setting lets you decide how to deal with already existing (known) projects and what is considered known in the first place. A project is known can be declared to be known if the name of an imported project matches the name of an already existing one. Or, it can be considered known if if there is a project that is linked to the very same stacks like the project to be imported. A stack in turn is known if there is already a stack with the same mirror image base. The last setting on this dialog is the Base URL. By default it is set to the value of IMPORTER_DEFAULT_IMAGE_BASE (if available). This setting plus the relative path stay the same for every project to be imported in this run. It is used if imported stacks don’t provide a URL explicitly. To continue, click on the next step button.

The importer will tell you if it doesn’t find any projects based on the settings of the first step. However, if it does find potential projects, it allows you to unselect projects that shouldn’t get imported and to add more details:

_images/project_setup.png

Besides deciding which projects to actually import, you can also add tags which will be attached to the new projects. If the tile size differs from the standard, it can be adjusted here. If you want your projects to be accessible publicly, you can mark the corresponding check-box.

When the Check classification links option is selected, the importer tries to suggest existing classification graphs to be linked to the new project(s). These suggestions are optional and based on the tags you entered before. If existing projects have the same tags or a super set of it, their linked classification graphs will be suggested.

The last adjustment to make are permissions. With the help of a list box you can select one or more group/permission combinations that the new projects will be assigned. If all is how you want it, you can proceed to the next dialog.

The third and last step is a confirmation where all the information is shown the importer found about the projects and stacks to be imported. To change things in this import, simply go back to a step before, using the buttons at the bottom of the page. If all the project and stack properties as well as the tags and permissions are correct, the actual import can start.

In the end the importer will tell you which projects have been imported and, if there were problems, which ones not.