Link Search Menu Expand Document

Step 24 - workflow instead of process

We already have quite some helper functionality that should be provided by a DiFlow module:

  • Generating an output file name based on input
  • Parsing different types of input file references
  • Selecting the proper key from the (large) ConfigMap stored in the global params Map.
  • Generating the CLI from nextflow.config
  • Providing a test case for the module

There is some hidden functionality as well:

  • Making sure input and output file references are updated in the ConfigMap
  • Dealing with per-sample configuration (upcoming feature)

As it turns out, providing all this functionality in a process is not the proper way to go, and is even expected not to work. Luckily we can define workflows in NextFlow’s DSL2 syntax. Such a workflow can be used just like a process in the above example as long as we take care that the input/output signatures are aligned with what is expected.

The added benefit of using a workflow rather than a process is that the underlying process can have a different signature. In practice, this is what a DiFlow process looks like:

process cellranger_process {
  ...
  container "${params.dockerPrefix}${container}"
  publishDir "${params.output}/processed_data/${id}/", mode: 'copy', overwrite: true
  input:
    tuple val(id), path(input), val(output), val(container), val(cli)
  output:
    tuple val("${id}"), path("${output}")
  script:
    """
    export PATH="${moduleDir}:\$PATH"
    $cli
    """
}

As can be seen, the ConfigMap is not passed to the process but instead the information in params is used to generate an output filename, extract the container image and generate the CLI instruction. Please note that input here points to ALL possible input file references as per Step 23.

The workflow that points to this process is then defined as follows:

workflow cellranger {
  take:
    id_input_params_
  main:
    ...
    result_ =  ...
  emit:
    result_
}