Parameters
The pipeline supports many types of parameters for customization to your compute environment and data. These can all be specified on the command line using the standard syntax --argument="value"
or --argument "value"
. You can also use any option supported by Nextflow itself. Note that certain arguments (i.e. those interpreted by Nextflow) use a single dash instead of two.
Environment Variables
You can export variables into your environment before calling the pipeline, or set them on the same line like this:
SINGULARITY_TMPDIR=/opt/tmp ./examples/demo_small.sh /opt/demo_small
Variable | Default | Description |
---|---|---|
TMPDIR | /tmp | Directory used for temporary files by certain processes. |
SINGULARITY_TMPDIR | /tmp | Directory where Docker images are downloaded and converted to Singularity Image Format. Needs to be large enough to accommodate several GB, so moving it out of /tmp is sometimes necessary. |
Data
Describe your input data and where pipeline results should be saved
Parameter | Description | Help Text | Type |
---|---|---|---|
data_manifest | Name or path to the file manifest for downloading input data. Default: segmentation | If specified, the data in the manifest is downloaded into --data_dir before the pipeline begins. Valid values are any base filename found in the data-sets directory (e.g. “demo_small”, “demo_medium”) or any absolute path which points to a manifest file. By default this just downloads the segmentation model. | string |
verify_md5 | Verify MD5 sum for all downloads. Default: true | This can be disabled to save time, but it’s not recommended. | string |
shared_work_dir | Shared working directory accessible by all nodes. Typically something like /fsx/username/pipeline | Setting this parameter will automatically configure data_dir , output_dir , segmentation_model_dir , spark_work_dir , and singularity_cache_dir . You can override any of them in the hidden settings. When running on a system like AWS Batch, you should set this to an FSx for Lustre filesystem, and the final_output_dir to a Fuse-mounted S3 bucket. This will cause all processing to happen on high-performance disk, and the outputs will only be copied to slower S3 at the very last step. | |
string | |||
data_dir | Path to the directory containing the input CZI/MVL acquisition files. If shared_work_dir is defined, this defaults to $shared_work_dir/inputs. | If shared_work_dir is defined, this is automatically set to $shared_work_dir/inputs . | string |
segmentation_model_dir | Path to the directory containing the machine learning model for segmentation. | If shared_work_dir is defined, this is automatically set to $shared_work_dir/inputs/model/starfinity . It is assumed that either the model is already there, or it will be downloaded and unzipped according to the data_manifest . Otherwise it defaults to ${projectDir}/external-modules/segmentation/model/starfinity, which is normally configured by setup.sh. | string |
output_dir | Path to the directory containing pipeline outputs. If shared_work_dir is defined, this defaults to $shared_work_dir/outputs. | If shared_work_dir is defined, this is automatically set to $shared_work_dir/outputs . | string |
publish_dir | Optional publishing directory where results should be copied when the pipeline is successfully completed. Typically a Fusion mount path like /fusion/s3/bucket-name. | This is useful for getting data off of FSx and onto something externally accessible like S3. | string |
acq_names | Names of acquisition rounds to process. These should match the names of the CZI/MVL files found in the data_dir. | e.g. LHA3_R3_small,LHA3_R5_small if you have files called LHA3_R3_small.czi and LHA3_R5_small.czi | string |
ref_acq | Name of the acquisition round to use as the fixed reference. | e.g. LHA3_R3_small | string |
channels | List of channel names to process. | Channel names are specified in the format “c[channel_number]”, where the channel_number is 0-indexed. | string |
dapi_channel | Name of the DAPI channel. | The DAPI channel is used as a reference channel for registration, segmentation, and spot extraction. | string |
bleed_channel | Channel (other than DAPI) that needs bleedthrough correction. | string |
Stitching
Stitching options
Parameter | Description | Help Text | Type |
---|---|---|---|
stitching_output | Output directory for stitching results. Default: stitching | This directory path is relative to output_dir | string |
spark_work_dir | Path to directory containing Spark working files and logs during stitching. Default: $shared_work_dir/spark or $workDir/spark | The Spark configuration is written here by the pipeline before launching the Spark cluster. The Spark workers write their logs back here, and it is also used to communicate the master IP address to all workers. Therefore, this must be a shared directory accessible to both the head node and all worker nodes. On AWS, Fuse-mounted S3 will not work here due to write buffering. It’s best to use FSx, but EBS will also work, as long as its mounted on all the EC2 nodes. | string |
spark_local_dir | Path to directory that Spark will uses for local temporary files. Default: /tmp | This path does not need to be shared among workers, and does not need to be accessible to the head node. Usually, /tmp will do. | string |
stitching_czi_pattern | A suffix pattern that is applied to acq_names when creating CZI names e.g. “_V%02d” | string | |
stitching_ref | Index of the channel used for stitching, e.g. ‘c1’ or ‘1’. You can also specify ‘all’ to use all of the channels. Default: the dapi_channel | If this is not defined it defaults to dapi_channel | string |
resolution | Voxel resolution in all 3 dimensions. Default: 0.23,0.23,0.42 | This is a comma-delimited tuple as x,y,z. | string |
axis | Axis mapping for the objective->pixel coordinates conversion. Default: -x,y,z | Comma-separated axis specification with optional flips. | string |
stitching_block_size | Block size to use when converting CZI to n5 before stitching. Default: 128,128,64 | string | |
flatfield_correction | Apply flatfield correction before stitching? Default: true | boolean | |
retile_z_size | Block size (in Z dimension) when retiling after stitching. Default: 64 | This must be smaller than the number of Z slices in the data. | integer |
with_fillBackground | Use fillBackground option when running fuse step. Default: true | Turning this off may help process certain types of data that error otherwise. | boolean |
stitching_mode | Rematching mode (‘full’ or ‘incremental’). Default: incremental | string | |
stitching_padding | Padding for the overlap regions. Default: 0,0,0 | string | |
stitching_blur_sigma | Sigma value of the gaussian blur preapplied to the images before stitching. Default: 2 | integer | |
workers | Number of Spark workers to use for stitching one acquisition. Default: 4 | integer | |
worker_cores | Number of cores allocated to each Spark worker. Default: 4 | integer | |
gb_per_core | Size of memory (in GB) that is allocated for each core of a Spark worker. Default: 4 | The total memory usage for stitching one acquisition will be workers worker_cores gb_per_core. | integer |
driver_memory | Amount of memory to allocate for the Spark driver. Default: 15g | string | |
wait_for_spark_timeout_seconds | Number of seconds to wait for Spark cluster to start. Default: 3600 | integer | |
sleep_between_timeout_checks_seconds | Number of seconds to sleep between timeout checks. Default: 2 | integer | |
stitching_app | Path to the JAR file containing the stitching application. Default: /app/app.jar | string |
Registration
Options for the registration algorithm (Bigstream)
Parameter | Description | Help Text | Type |
---|---|---|---|
registration_output | Output directory for registration results. Default: registration | This path is relative to output_dir . | string |
aff_scale | The scale level for affine alignments. Default: s3 | string | |
def_scale | The scale level for deformable alignments. Default: s2 | string | |
spots_cc_radius | Default: 8 | integer | |
spots_spot_number | Default: 2000 | integer | |
ransac_cc_cutoff | Default: 0.9 | number | |
ransac_dist_threshold | Default: 2.5 | number | |
deform_iterations | Default: 500x200x25x1 | string | |
deform_auto_mask | Default: 0 | string | |
registration_xy_stride | The number of voxels along x/y for registration tiling. Default: 256 | Must be power of 2. | integer |
registration_xy_overlap | Tile overlap on x/y axes | Defaults to registration_xy_stride/8 when not specified. | integer |
registration_z_stride | The number of voxels along z for registration tiling. Default: 256 | Must be power of 2. | integer |
registration_z_overlap | Tile overlap on Z axes | Defaults to registration_z_stride/8 when not specified. | integer |
ransac_cpus | Number of CPU cores for RANSAC. Default: 1 | integer | |
ransac_memory | Amount of memory for RANSAC. Default: 1 G | string | |
spots_cpus | Number of CPU cores for Spots step of registration. Default: 1 | string | |
spots_memory | Amount of memory for Spots step of registration. Default: 2 G | string | |
interpolate_cpus | Number of CPU cores for Interpolate step of registration. Default: 1 | integer | |
interpolate_memory | Amount of memory for Interpolate step of registration. Default: 1 G | string | |
coarse_spots_cpus | Number of CPU cores for Coarse Spots step of registration. Default: 1 | integer | |
coarse_spots_memory | Amount of memory for Coarse Spots step of registration. Default: 2 G | string | |
aff_scale_transform_cpus | Number of CPU cores for Affine Scale Transform step of registration. Default: 1 | integer | |
aff_scale_transform_memory | Amount of memory for Affine Scale Transform step of registration. Default: 15 G | string | |
def_scale_transform_cpus | Number of CPU cores for deformable scale registration. Default: 8 | integer | |
def_scale_transform_memory | Amount of memory for Deformable Scale Transform step of registration. Default: 80 G | string | |
deform_cpus | Number of CPU cores for Deform step of registration. Default: 1 | integer | |
deform_memory | Amount of memory for Deform step of registration. Default: 10 G | string | |
registration_stitch_cpus | Number of CPU cores for Stitch step of registration. Default: 2 | integer | |
registration_stitch_memory | Amount of memory for Stitch step of registration. Default: 20 G | string | |
registration_transform_cpus | Number of CPU cores for final Transform step of registration. Default: 12 | integer | |
registration_transform_memory | Amount of memory for final Transform step of registration. Default: 80 G | string |
Cell Segmentation
Options for the cell segmentation algorithm (Starfinity)
Parameter | Description | Help Text | Type |
---|---|---|---|
segmentation_output | Output directory for segmentation results. Default: segmentation | This path is relative to output_dir . | string |
segmentation_scale | Imagery scale to use for segmentation. Default: s2 | string | |
segmentation_cpus | Number of CPU cores for segmentation. Default: 3 | integer | |
segmentation_memory | Amount of memory for segmentation. Default: 45 G | string |
Spot Extraction
Options for spot extraction
Parameter | Description | Help Text | Type |
---|---|---|---|
spot_extraction_output | Output directory for spot extraction results. Default: spots | This path is relative to output_dir . | string |
spot_extraction_scale | Scale of imagery to use for spot extraction. Default: s0 | string |
Spot Extraction: Airlocalize
Options for the AirLocalize spot extraction algorithm
Parameter | Description | Help Text | Type |
---|---|---|---|
airlocalize_xy_stride | The number of voxels along x/y for registration tiling. Default: 1024 | Must be power of 2. Increasing this requires increasing airlocalize_memory . | integer |
airlocalize_xy_overlap | Tile overlap on x/y axes | Defaults to 5% of airlocalize_xy_stride | integer |
airlocalize_z_stride | The number of voxels along Z for registration tiling. Default: 512 | Must be a power of 2. Increasing this requires increasing airlocalize_memory . | integer |
airlocalize_z_overlap | Tile overlap on z axes | Defaults to 5% of airlocalize_z_stride | integer |
default_airlocalize_params | Path to the default AirLocalize parameter file. Default: /app/airlocalize/params/air_localize_default_params.txt | By default, this points to default parameters inside the container | string |
per_channel_air_localize_params | Comma-delimited paths to alternative AirLocalize parameter files, one per channel. | If you have 4 channels, and you are extracting spots from c0, c1, and c3, this parameter should look like this: /path/to/params_c0.txt,/path/to/params_c1.txt,,/path/to/params_c3.txt . Note the double comma to denote the empty file for c2, which should not be processed. | string |
airlocalize_cpus | Number of CPU cores to allocate for each AirLocalize job. Default: 1 | integer | |
airlocalize_memory | Amount of RAM to allocate to each AirLocalize job. Needs to be increased when increasing strides. Default: 2 G | integer |
Spot Extraction: RS-FISH
Options for the RS-FISH spot extraction algorithm
Parameter | Description | Help Text | Type |
---|---|---|---|
use_rsfish | Use RS-FISH instead of AirLocalize for Spot Extraction. Default: false | boolean | |
rsfish_min | Minimal intensity of the image. Default: 0 | integer | |
rsfish_max | Maximal intensity of the image. Default: 4096 | integer | |
rsfish_anisotropy | The anisotropy factor. Default: 0.7 | Scaling of z relative to xy. Can be determined using the RS-FISH anisotropy plugin in Fiji. | number |
rsfish_sigma | Sigma value for Difference-of-Gaussian (DoG) calculation. Default 1.5 | number | |
rsfish_threshold | Threshold value for Difference-of-Gaussian (DoG) calculation. Default: 0.007 | number | |
rsfish_background | Background subtraction method, 0 == None, 1 == Mean, 2==Median, 3==RANSAC on Mean, 4==RANSAC on Median. Default: 0 (None) | integer | |
rsfish_intensity | Intensity calculation method, 0 == Linear Interpolation, 1 == Gaussian fit (on inlier pixels), 2 == Integrate spot intensities (on candidate pixels). Default: 0 (Linear Interpolation) | integer | |
rsfish_params | Any other parameters to pass to the RS-FISH algorithm. | Complete parameter documentation for RS-FISH is available here. | string |
rsfish_workers | Number of Spark workers to use for RS-FISH spot detection. Default: 4 | integer | |
rsfish_worker_cores | Number of cores allocated to each RS-FISH Spark worker. Default: 4 | integer | |
rsfish_gb_per_core | Size of memory (in GB) that is allocated for each core of a RS-FISH Spark worker. Default: 4 | The total memory usage for one acquisition will be workers worker_cores gb_per_core. | integer |
rsfish_driver_cores | Number of cores allocated for the RS-FISH Spark driver. Default: 1 | string | |
rsfish_driver_memory | Amount of memory to allocate for the RS-FISH Spark driver. Default: 15g | string |
Per channel RS-FISH Parameters
The following parameters can be set per channel: rsfish_min
, rsfish_max
, rsfish_anisotropy
, rsfish_sigma
, rsfish_threshold
, rsfish_background
, rsfish_intensity
. Simply prefix the corresponding parameter with per_channel.
and set the values using a comma delimited list. The values will be associated with the corresponding channel based on their position, i.e. first value will be associated with the first channel, second with the second channel, etc. If a value is missing or empty the parameter value for the channel will be set to the default from the parameter with the same name (presented above).
For example if the command like is: --channels c0,c1,c2,c3 --sigma 1.7 --per_channel.sigma "1.2,,1.4
channel c0 will use sigma 1.2
channel c1 will use the default sigma 1.7 (because of the empty value)
channel c2 will use sigma 1.4
channel c3 will use the default sigma 1.7 (because of the missing value - sigma values list is shorter then the channels list)
Spot Warping
Options for warping detected spots to registration
Parameter | Description | Help Text | Type |
---|---|---|---|
warp_spots_cpus | Number of CPU cores to use for warp spots. Default: 2 | integer | |
warp_spots_memory | Amount of memory for warp spots. Default: 30 G | string |
Intensity Measurement
Options for extracting quantified measurements of spot intensities
Parameter | Description | Help Text | Type |
---|---|---|---|
measure_intensities_output | Output directory for intensities. Default: intensities | This path is relative to output_dir . | string |
measure_intensities_cpus | Number of CPU cores to use for intensity measurement. Default: 1 | integer | |
measure_intensities_memory | Amount of memory for intensity measurement. Default: 8 G | string |
Spot Assignment
Options for mapping spot counts to segmented cells
Parameter | Description | Help Text | Type |
---|---|---|---|
assign_spots_output | Output directory for spot assignments. Default: assignments | This path is relative to output_dir . | string |
assign_spots_cpus | Number of CPU cores to use for spot assignment. Default: 1 | integer | |
assign_spots_memory | Amount of memory for spot assignment. Default: 5 G | string |
Container Options
Customize the Docker containers used for each pipeline step
Parameter | Description | Help Text | Type |
---|---|---|---|
mfrepo | Docker registry/repository to use for containers. Default: janeliascicomp | By default, the pipeline uses containers built as part of this project and deployed to DockerHub. You can rebuild the containers and deploy them to your own Registry and specify it here. | string |
spark_container_repo | Docker container repo for stitching. Default: <mfrepo> | string | |
spark_container_name | Docker container name for stitching. Default: stitching | string | |
spark_container_version | Docker container version for stitching. Default: 1.0.0 | string | |
registration_container | Docker container for running registration and warp_spots. Default: <mfrepo> /registration:1.2.0 | string | |
segmentation_container | Docker container for running segmentation. Default: <mfrepo> /segmentation:1.0.0 | string | |
airlocalize_container | Docker container for running spot extraction. Default: <mfrepo> /airlocalize:1.0.2 | string | |
spots_assignment_container | Docker container for running intensity measurement and spot assignment. Default: <mfrepo> /spot_assignment:1.2.0 | string |
Other Options
Other global options affecting all pipelines stages
Parameter | Description | Help Text | Type |
---|---|---|---|
skip | Comma-delimited list of steps to skip, e.g. stitching,registration. | Valid values: stitching,spot_extraction,segmentation,registration,warp_spots,measure_intensities,assign_spots | string |
singularity_cache_dir | Shared directory where Singularity containers are cached. Default: $shared_work_dir/singularity_cache or $HOME/.singularity_cache | string | |
singularity_user | User to use for running Singularity containers. Default: $USER | This is automatically set to ec2-user when using the ‘tower’ profile | string |
runtime_opts | Runtime options for the container engine being used (e.g. Singularity or Docker). | Runtime options for Singularity must include mounts for any directory paths you are using. You can also pass the –nv flag here to make use of NVIDIA GPU resources. For example, --nv -B /your/data/dir -B /your/output/dir | |
string | |||
lsf_opts | Options for LSF cluster at Janelia, when using the lsf profile. | string |