Running the pipeline

Specifying inputs

The pipeline accepts a parameter called shared_work_dir which points to a directory where the pipeline will read inputs, store intermediate results, and write final outputs. This directory is shared (e.g. accessible) between the pipeline and the cluster. The shared work directory is organized like this:

shared_work_dir/
    inputs/
        ...
    outputs/
        ...
    spark/
        ...

You should create the top level directory before running the pipeline.

The pipeline accepts CZI/MVL files as input, so the first step is to point it to your input data. There are two ways to do this:

Option 1 - Use a data manifest

The demo scripts under ./examples use the data_manifest parameter to download an input data set. In this case, the shared_work_dir/inputs directory will be automatically created and populated with the data. If the data has been downloaded in the past, the download will be skipped and the files will be verified using MD5 checksums.

You can use this method for your own custom data as well, especially if you have small data sets that you wish to run repeatedly for benchmarking or reproducibility. The data manifest is a simple text file that lists the files in the data set, their MD5 sums, and HTTP links to download them. Create your manifest file and point the pipeline to it with the data_manifest parameter. You can use verify_md5=false to skip the MD5 checksum verification for faster iteration.

Option 2 - Place data manually

For larger custom data, it’s recommended that you place it in the inputs directory manually. Your CZI and MVL files should be paired by name, e.g.:

shared_work_dir/
    inputs/
        LHA3_R3_small.czi
        LHA3_R3_small.mvl
        LHA3_R5_small.czi
        LHA3_R5_small.mvl

In this case, you don’t need to provide a data_manifest parameter. The default for this parameter is “segmentation”, which downloads just the segmentation model and places it in the inputs directory, next to your data.

Setting parameters

See the parameter documentation to customize the pipeline for your data.

Starting the pipeline

Using Nextflow Tower

See the Nextflow Tower documentation for step-by-step instructions on how to run the pipeline from the Nextflow Tower web GUI.

Using the Command Line Interface (CLI)

Assuming you have two acquisitions as above (i.e. named LHA3_R3_small and LHA3_R5_small), you can run the pipeline with the following command:

./main.nf --shared_work_dir /path/to/shared/work/dir --runtime_opts "-B /path/to/shared/work/dir" --acq_names LHA3_R3_small,LHA3_R5_small --ref_acq LHA3_R3_small --channels c0,c1 --dapi_channel c1 [other parameters]

This will run the pipeline and execute all of the jobs on the local workstation where you invoke the command. To use a cluster or cloud for executing the jobs, see the Platforms documentation.

The --runtime_opts parameter is required to mount the shared work directory inside the Singularity containers that are used to execute the pipeline jobs. The --acq_names parameter is required to specify the names of the acquisitions to process. The --ref_acq parameter is required to specify the name of the reference acquisition. The --channels parameter specifies the names of the channels to process in each acquisition. The --dapi_channel parameter is used to specify the name of the channel that contains the DAPI stain.

Pipeline outputs

The output directory contains one folder each acquisition in --acq_names. Under each acquisition directory, you’ll find a folder for each step in the pipeline that was applied to that round.

LHA3_R3_tiny
  assignments
  intensities
  segmentation
  spots
  stitching

LHA3_R5_tiny
  assignments
  intensities
  registration
  spots
  stitching

In this case, the pipeline was run on two acquisitions, LHA3_R3_medium and LHA3_R5_medium. The stitching step was run on both acquisitions. Then the LHA3_R3_medium was segmented and the LHA3_R5_medium was registered to the LHA3_R3_medium. Finally, spot extraction, intensity measurement, and cell assignment were all run on both acquisitions.

Stitching Output

The output of the stitching step includes many intermediate files that can be used for debugging and verification of the results.

  • tiles.json - multi-view metadata about the acquisition converted from the MVL file
  • tiles.n5 - imagery converted from CZI to n5 format tiled according to --stitching_block_size
  • c<channel>-n5.json - metadata about each channel in tiles.n5
  • c<channel>-flatfield - files for flatfield-correction including the calculated brightfield and offset
  • c<channel>-n5-retiled.json - metadata after retiling
  • retiled-images - retiled images
  • optimizer-final.txt - stitching log
  • c<channel>-n5-retiled-final.json - metadata output of stitching
  • export.n5 - final stitched result, tiled according to --retile_z_size

Full details about the stitching pipeline are available here.

Segmentation

The segmentation directory contains a single TIFF file with the cell segmentation result.

Registration

The registration directory contain a <moving>-to-<fixed> directory, e.g LHA3_R5_medium-to-LHA3_R3_medium. Inside that folder:

  • tiles - tile-specific intermediate files
  • aff - result of RANSAC affine alignment
  • transform - registration transform (n5 format)
  • invtransform - inverse of registration transform (n5 format)
  • warped - final registered imagery (n5 format)

Spots

If you use AirLocalize, you’ll get this output:

  • tiles - n5 formatted stack retiled for spot extraction
  • spots_CH.txt - Per channel CSV file containing the spots found in that channel. This is a CSV file containing coordinates of the spots, in microns. This file is used for downstream analysis (e.g. cell assignment).
  • spots_airlocalize_CH.csv - Per channel CSV file containing voxel coordinates of the spots found in channel 0. This file is compatible with the RS-FISH Fiji Plugin.

If you use RS-FISH, you’ll get this output:

  • spots_CH.txt - Per channel CSV file containing the spots found in that channel. This is a CSV file containing coordinates of the spots, in microns. This file is used for downstream analysis (e.g. cell assignment).
  • spots_rsfish_CH.csv - Per channel CSV file containing voxel coordinates of the spots found in channel 0. This file is compatible with the RS-FISH Fiji Plugin.

Intensities

Per channel CSV file containing intensities of each segmented cell (“ROI”).

Assignments

Single CSV file where the first column is an index into the cell segmentation, and the other columns represent the number of points found in that cell in each channel.