Parameters

The pipeline supports many types of parameters for customization to your compute environment and data. These can all be specified on the command line using the standard syntax --argument="value" or --argument "value". You can also use any option supported by Nextflow itself. Note that certain arguments (i.e. those interpreted by Nextflow) use a single dash instead of two.

Environment Variables

You can export variables into your environment before calling the pipeline, or set them on the same line like this:

SINGULARITY_TMPDIR=/opt/tmp ./examples/demo_small.sh /opt/demo_small

Variable	Default	Description
`TMPDIR`	/tmp	Directory used for temporary files by certain processes.
`SINGULARITY_TMPDIR`	/tmp	Directory where Docker images are downloaded and converted to Singularity Image Format. Needs to be large enough to accommodate several GB, so moving it out of /tmp is sometimes necessary.

Data

Describe your input data and where pipeline results should be saved

Parameter	Description	Help Text	Type
`data_manifest`	Name or path to the file manifest for downloading input data. Default: segmentation	If specified, the data in the manifest is downloaded into `--data_dir` before the pipeline begins. Valid values are any base filename found in the data-sets directory (e.g. “demo_small”, “demo_medium”) or any absolute path which points to a manifest file. By default this just downloads the segmentation model.	`string`
`verify_md5`	Verify MD5 sum for all downloads. Default: true	This can be disabled to save time, but it’s not recommended.	`string`
`shared_work_dir`	Shared working directory accessible by all nodes. Typically something like /fsx/username/pipeline	Setting this parameter will automatically configure `data_dir`, `output_dir`, `segmentation_model_dir`, `spark_work_dir`, and `singularity_cache_dir`. You can override any of them in the hidden settings. When running on a system like AWS Batch, you should set this to an FSx for Lustre filesystem, and the final_output_dir to a Fuse-mounted S3 bucket. This will cause all processing to happen on high-performance disk, and the outputs will only be copied to slower S3 at the very last step.
`string`
`data_dir`	Path to the directory containing the input CZI/MVL acquisition files. If shared_work_dir is defined, this defaults to $shared_work_dir/inputs.	If `shared_work_dir` is defined, this is automatically set to `$shared_work_dir/inputs`.	`string`
`segmentation_model_dir`	Path to the directory containing the machine learning model for segmentation.	If `shared_work_dir` is defined, this is automatically set to `$shared_work_dir/inputs/model/starfinity`. It is assumed that either the model is already there, or it will be downloaded and unzipped according to the `data_manifest`. Otherwise it defaults to ${projectDir}/external-modules/segmentation/model/starfinity, which is normally configured by setup.sh.	`string`
`output_dir`	Path to the directory containing pipeline outputs. If shared_work_dir is defined, this defaults to $shared_work_dir/outputs.	If `shared_work_dir` is defined, this is automatically set to `$shared_work_dir/outputs`.	`string`
`publish_dir`	Optional publishing directory where results should be copied when the pipeline is successfully completed. Typically a Fusion mount path like /fusion/s3/bucket-name.	This is useful for getting data off of FSx and onto something externally accessible like S3.	`string`
`acq_names`	Names of acquisition rounds to process. These should match the names of the CZI/MVL files found in the data_dir.	e.g. LHA3_R3_small,LHA3_R5_small if you have files called LHA3_R3_small.czi and LHA3_R5_small.czi	`string`
`ref_acq`	Name of the acquisition round to use as the fixed reference.	e.g. LHA3_R3_small	`string`
`channels`	List of channel names to process.	Channel names are specified in the format “c[channel_number]”, where the channel_number is 0-indexed.	`string`
`dapi_channel`	Name of the DAPI channel.	The DAPI channel is used as a reference channel for registration, segmentation, and spot extraction.	`string`
`bleed_channel`	Channel (other than DAPI) that needs bleedthrough correction.		`string`

Stitching

Stitching options

Parameter	Description	Help Text	Type
`stitching_output`	Output directory for stitching results. Default: stitching	This directory path is relative to `output_dir`	`string`
`spark_work_dir`	Path to directory containing Spark working files and logs during stitching. Default: $shared_work_dir/spark or $workDir/spark	The Spark configuration is written here by the pipeline before launching the Spark cluster. The Spark workers write their logs back here, and it is also used to communicate the master IP address to all workers. Therefore, this must be a shared directory accessible to both the head node and all worker nodes. On AWS, Fuse-mounted S3 will not work here due to write buffering. It’s best to use FSx, but EBS will also work, as long as its mounted on all the EC2 nodes.	`string`
`spark_local_dir`	Path to directory that Spark will uses for local temporary files. Default: /tmp	This path does not need to be shared among workers, and does not need to be accessible to the head node. Usually, /tmp will do.	`string`
`stitching_czi_pattern`	A suffix pattern that is applied to acq_names when creating CZI names e.g. “_V%02d”		`string`
`stitching_ref`	Index of the channel used for stitching, e.g. ‘c1’ or ‘1’. You can also specify ‘all’ to use all of the channels. Default: the dapi_channel	If this is not defined it defaults to `dapi_channel`	`string`
`resolution`	Voxel resolution in all 3 dimensions. Default: 0.23,0.23,0.42	This is a comma-delimited tuple as x,y,z.	`string`
`axis`	Axis mapping for the objective->pixel coordinates conversion. Default: -x,y,z	Comma-separated axis specification with optional flips.	`string`
`stitching_block_size`	Block size to use when converting CZI to n5 before stitching. Default: 128,128,64		`string`
`flatfield_correction`	Apply flatfield correction before stitching? Default: true		`boolean`
`retile_z_size`	Block size (in Z dimension) when retiling after stitching. Default: 64	This must be smaller than the number of Z slices in the data.	`integer`
`with_fillBackground`	Use fillBackground option when running fuse step. Default: true	Turning this off may help process certain types of data that error otherwise.	`boolean`
`stitching_mode`	Rematching mode (‘full’ or ‘incremental’). Default: incremental		`string`
`stitching_padding`	Padding for the overlap regions. Default: 0,0,0		`string`
`stitching_blur_sigma`	Sigma value of the gaussian blur preapplied to the images before stitching. Default: 2		`integer`
`workers`	Number of Spark workers to use for stitching one acquisition. Default: 4		`integer`
`worker_cores`	Number of cores allocated to each Spark worker. Default: 4		`integer`
`gb_per_core`	Size of memory (in GB) that is allocated for each core of a Spark worker. Default: 4	The total memory usage for stitching one acquisition will be workers worker_cores gb_per_core.	`integer`
`driver_memory`	Amount of memory to allocate for the Spark driver. Default: 15g		`string`
`wait_for_spark_timeout_seconds`	Number of seconds to wait for Spark cluster to start. Default: 3600		`integer`
`sleep_between_timeout_checks_seconds`	Number of seconds to sleep between timeout checks. Default: 2		`integer`
`stitching_app`	Path to the JAR file containing the stitching application. Default: /app/app.jar		`string`

Registration

Options for the registration algorithm (Bigstream)

Parameter	Description	Help Text	Type
`registration_output`	Output directory for registration results. Default: registration	This path is relative to `output_dir`.	`string`
`aff_scale`	The scale level for affine alignments. Default: s3		`string`
`def_scale`	The scale level for deformable alignments. Default: s2		`string`
`spots_cc_radius`	Default: 8		`integer`
`spots_spot_number`	Default: 2000		`integer`
`ransac_cc_cutoff`	Default: 0.9		`number`
`ransac_dist_threshold`	Default: 2.5		`number`
`deform_iterations`	Default: 500x200x25x1		`string`
`deform_auto_mask`	Default: 0		`string`
`registration_xy_stride`	The number of voxels along x/y for registration tiling. Default: 256	Must be power of 2.	`integer`
`registration_xy_overlap`	Tile overlap on x/y axes	Defaults to registration_xy_stride/8 when not specified.	`integer`
`registration_z_stride`	The number of voxels along z for registration tiling. Default: 256	Must be power of 2.	`integer`
`registration_z_overlap`	Tile overlap on Z axes	Defaults to registration_z_stride/8 when not specified.	`integer`
`ransac_cpus`	Number of CPU cores for RANSAC. Default: 1		`integer`
`ransac_memory`	Amount of memory for RANSAC. Default: 1 G		`string`
`spots_cpus`	Number of CPU cores for Spots step of registration. Default: 1		`string`
`spots_memory`	Amount of memory for Spots step of registration. Default: 2 G		`string`
`interpolate_cpus`	Number of CPU cores for Interpolate step of registration. Default: 1		`integer`
`interpolate_memory`	Amount of memory for Interpolate step of registration. Default: 1 G		`string`
`coarse_spots_cpus`	Number of CPU cores for Coarse Spots step of registration. Default: 1		`integer`
`coarse_spots_memory`	Amount of memory for Coarse Spots step of registration. Default: 2 G		`string`
`aff_scale_transform_cpus`	Number of CPU cores for Affine Scale Transform step of registration. Default: 1		`integer`
`aff_scale_transform_memory`	Amount of memory for Affine Scale Transform step of registration. Default: 15 G		`string`
`def_scale_transform_cpus`	Number of CPU cores for deformable scale registration. Default: 8		`integer`
`def_scale_transform_memory`	Amount of memory for Deformable Scale Transform step of registration. Default: 80 G		`string`
`deform_cpus`	Number of CPU cores for Deform step of registration. Default: 1		`integer`
`deform_memory`	Amount of memory for Deform step of registration. Default: 10 G		`string`
`registration_stitch_cpus`	Number of CPU cores for Stitch step of registration. Default: 2		`integer`
`registration_stitch_memory`	Amount of memory for Stitch step of registration. Default: 20 G		`string`
`registration_transform_cpus`	Number of CPU cores for final Transform step of registration. Default: 12		`integer`
`registration_transform_memory`	Amount of memory for final Transform step of registration. Default: 80 G		`string`

Cell Segmentation

Options for the cell segmentation algorithm (Starfinity)

Parameter	Description	Help Text	Type
`segmentation_output`	Output directory for segmentation results. Default: segmentation	This path is relative to `output_dir`.	`string`
`segmentation_scale`	Imagery scale to use for segmentation. Default: s2		`string`
`segmentation_cpus`	Number of CPU cores for segmentation. Default: 3		`integer`
`segmentation_memory`	Amount of memory for segmentation. Default: 45 G		`string`

Spot Extraction

Options for spot extraction

Parameter	Description	Help Text	Type
`spot_extraction_output`	Output directory for spot extraction results. Default: spots	This path is relative to `output_dir`.	`string`
`spot_extraction_scale`	Scale of imagery to use for spot extraction. Default: s0		`string`

Spot Extraction: Airlocalize

Options for the AirLocalize spot extraction algorithm

Parameter	Description	Help Text	Type
`airlocalize_xy_stride`	The number of voxels along x/y for registration tiling. Default: 1024	Must be power of 2. Increasing this requires increasing `airlocalize_memory`.	`integer`
`airlocalize_xy_overlap`	Tile overlap on x/y axes	Defaults to 5% of airlocalize_xy_stride	`integer`
`airlocalize_z_stride`	The number of voxels along Z for registration tiling. Default: 512	Must be a power of 2. Increasing this requires increasing `airlocalize_memory`.	`integer`
`airlocalize_z_overlap`	Tile overlap on z axes	Defaults to 5% of airlocalize_z_stride	`integer`
`default_airlocalize_params`	Path to the default AirLocalize parameter file. Default: /app/airlocalize/params/air_localize_default_params.txt	By default, this points to default parameters inside the container	`string`
`per_channel_air_localize_params`	Comma-delimited paths to alternative AirLocalize parameter files, one per channel.	If you have 4 channels, and you are extracting spots from c0, c1, and c3, this parameter should look like this: `/path/to/params_c0.txt,/path/to/params_c1.txt,,/path/to/params_c3.txt`. Note the double comma to denote the empty file for c2, which should not be processed.	`string`
`airlocalize_cpus`	Number of CPU cores to allocate for each AirLocalize job. Default: 1		`integer`
`airlocalize_memory`	Amount of RAM to allocate to each AirLocalize job. Needs to be increased when increasing strides. Default: 2 G		`integer`

Spot Extraction: RS-FISH

Options for the RS-FISH spot extraction algorithm

Parameter	Description	Help Text	Type
`use_rsfish`	Use RS-FISH instead of AirLocalize for Spot Extraction. Default: false		`boolean`
`rsfish_min`	Minimal intensity of the image. Default: 0		`integer`
`rsfish_max`	Maximal intensity of the image. Default: 4096		`integer`
`rsfish_anisotropy`	The anisotropy factor. Default: 0.7	Scaling of z relative to xy. Can be determined using the RS-FISH anisotropy plugin in Fiji.	`number`
`rsfish_sigma`	Sigma value for Difference-of-Gaussian (DoG) calculation. Default 1.5		`number`
`rsfish_threshold`	Threshold value for Difference-of-Gaussian (DoG) calculation. Default: 0.007		`number`
`rsfish_background`	Background subtraction method, 0 == None, 1 == Mean, 2==Median, 3==RANSAC on Mean, 4==RANSAC on Median. Default: 0 (None)		`integer`
`rsfish_intensity`	Intensity calculation method, 0 == Linear Interpolation, 1 == Gaussian fit (on inlier pixels), 2 == Integrate spot intensities (on candidate pixels). Default: 0 (Linear Interpolation)		`integer`
`rsfish_params`	Any other parameters to pass to the RS-FISH algorithm.	Complete parameter documentation for RS-FISH is available here.	`string`
`rsfish_workers`	Number of Spark workers to use for RS-FISH spot detection. Default: 4		`integer`
`rsfish_worker_cores`	Number of cores allocated to each RS-FISH Spark worker. Default: 4		`integer`
`rsfish_gb_per_core`	Size of memory (in GB) that is allocated for each core of a RS-FISH Spark worker. Default: 4	The total memory usage for one acquisition will be workers worker_cores gb_per_core.	`integer`
`rsfish_driver_cores`	Number of cores allocated for the RS-FISH Spark driver. Default: 1		`string`
`rsfish_driver_memory`	Amount of memory to allocate for the RS-FISH Spark driver. Default: 15g		`string`

Per channel RS-FISH Parameters

The following parameters can be set per channel: rsfish_min, rsfish_max, rsfish_anisotropy, rsfish_sigma, rsfish_threshold, rsfish_background, rsfish_intensity. Simply prefix the corresponding parameter with per_channel. and set the values using a comma delimited list. The values will be associated with the corresponding channel based on their position, i.e. first value will be associated with the first channel, second with the second channel, etc. If a value is missing or empty the parameter value for the channel will be set to the default from the parameter with the same name (presented above).

For example if the command like is: --channels c0,c1,c2,c3 --sigma 1.7 --per_channel.sigma "1.2,,1.4

channel c0 will use sigma 1.2

channel c1 will use the default sigma 1.7 (because of the empty value)

channel c2 will use sigma 1.4

channel c3 will use the default sigma 1.7 (because of the missing value - sigma values list is shorter then the channels list)

Spot Warping

Options for warping detected spots to registration

Parameter	Description	Help Text	Type
`warp_spots_cpus`	Number of CPU cores to use for warp spots. Default: 2		`integer`
`warp_spots_memory`	Amount of memory for warp spots. Default: 30 G		`string`

Intensity Measurement

Options for extracting quantified measurements of spot intensities

Parameter	Description	Help Text	Type
`measure_intensities_output`	Output directory for intensities. Default: intensities	This path is relative to `output_dir`.	`string`
`measure_intensities_cpus`	Number of CPU cores to use for intensity measurement. Default: 1		`integer`
`measure_intensities_memory`	Amount of memory for intensity measurement. Default: 8 G		`string`

Spot Assignment

Options for mapping spot counts to segmented cells

Parameter	Description	Help Text	Type
`assign_spots_output`	Output directory for spot assignments. Default: assignments	This path is relative to `output_dir`.	`string`
`assign_spots_cpus`	Number of CPU cores to use for spot assignment. Default: 1		`integer`
`assign_spots_memory`	Amount of memory for spot assignment. Default: 5 G		`string`

Container Options

Customize the Docker containers used for each pipeline step

Parameter	Description	Help Text	Type
`mfrepo`	Docker registry/repository to use for containers. Default: janeliascicomp	By default, the pipeline uses containers built as part of this project and deployed to DockerHub. You can rebuild the containers and deploy them to your own Registry and specify it here.	`string`
`spark_container_repo`	Docker container repo for stitching. Default: `<mfrepo>`		`string`
`spark_container_name`	Docker container name for stitching. Default: stitching		`string`
`spark_container_version`	Docker container version for stitching. Default: 1.0.0		`string`
`registration_container`	Docker container for running registration and warp_spots. Default: `<mfrepo>`/registration:1.2.0		`string`
`segmentation_container`	Docker container for running segmentation. Default: `<mfrepo>`/segmentation:1.0.0		`string`
`airlocalize_container`	Docker container for running spot extraction. Default: `<mfrepo>`/airlocalize:1.0.2		`string`
`spots_assignment_container`	Docker container for running intensity measurement and spot assignment. Default: `<mfrepo>`/spot_assignment:1.2.0		`string`

Other Options

Other global options affecting all pipelines stages

Parameter	Description	Help Text	Type
`skip`	Comma-delimited list of steps to skip, e.g. stitching,registration.	Valid values: stitching,spot_extraction,segmentation,registration,warp_spots,measure_intensities,assign_spots	`string`
`singularity_cache_dir`	Shared directory where Singularity containers are cached. Default: $shared_work_dir/singularity_cache or $HOME/.singularity_cache		`string`
`singularity_user`	User to use for running Singularity containers. Default: $USER	This is automatically set to `ec2-user` when using the ‘tower’ profile	`string`
`runtime_opts`	Runtime options for the container engine being used (e.g. Singularity or Docker).	Runtime options for Singularity must include mounts for any directory paths you are using. You can also pass the –nv flag here to make use of NVIDIA GPU resources. For example, `--nv -B /your/data/dir -B /your/output/dir`
`string`
`lsf_opts`	Options for LSF cluster at Janelia, when using the lsf profile.		`string`