Link Search Menu Expand Document

Step 23 - More than one input

It may appear in the above example that input can only be provided as one variable or file. NextFlow allows you to specify not only wildcards (* for instance) but also arrays. This comes in handy when multiple input file reference need to be provided on the CLI. For instance, when doing a mapping step we usually need to provide a reference file. We could add this reference file as a parameter (and take it up in nextflow.config but then it would just be seen as a string value. NextFlow can not know then that it has to check for existence of the file prior to starting to run the process[^3].

How does a DiFlow module know then what file reference is related to what option on the CLI? We would obviously not want to run CellRanger count on a reference file as input and use a fastq file as reference (as if that would work?!). That means we have to be sure to somehow let the DiFlow module know what file references correspond to what options on the CLI.

There are three possibilities:

  1. There is only one input file: in this case we just have to make sure it is passed as a Path object to the DIFlow module.

  2. There are multiple input files but they correspond to the same option on the CLI. For instance, cat‘ing multiple files where order is not relevant. In this case, we can simply pass a List[Path] to the DiFlow module.

  3. There are multiple input files corresponding to the different options on the CLI, for instance CellRanger with input and a reference file. In this case, we pass something like

[ "input": "<fastq-dir>", "reference": "<reference-dir>" ]

While there may be still other possibilities that we may encounter in the future, these three are covered by the current implementation of DiFlow.

A DiFlow module contains the necessary logic to parse three types of datastructures as input file references: Path, List[Path] and Map[String, List[Path]].

Remark: that the easiest way to create a Path object with the proper pointers in a NextFlow context is to use the built-in file() function. It simply takes a String argument pointing to the file (either relative or absolute).

There’s some more magic going on in the background. For starters, if you specify just either Path or List[Path], the DiFlow module will retrieve the appropriate command-line option to associate it with automagically. This way, as a pipeline developer you usually should not care about what exact command line option is necessary for your input data to be processed.