Step 13 - Files as input/output
Let us tackle a different angle now and start to deal with files as input and output. In order to do this, we will mimic the functionality from earlier and modify it such that a file is used as input and output is also written to a file.
The following combination of process
and workflow
definition does exactly the same as before, but now from one or more files containing just a single integer number:
// Step - 13
process process_step13 {
input:
tuple val(id), file(input), val(config)
output:
tuple val("${id}"), file("output.txt"), val("${config}")
script:
"""
a=`cat $input`
let result="\$a + ${config.term}"
echo "\$result" > output.txt
"""
}
workflow step13 {
Channel.fromPath( params.input ) \
| map{ el ->
[
el.baseName.toString(),
el,
[ "operator" : "-", "term" : 10 ]
]} \
| process_step13 \
| view{ [ it[0], it[1] ] }
}
While doing this, we also introduced a way to specify parameters via a configuration file (nextflow.config
) or from the CLI. In this case params.input
points to an argument we should provide on the CLI, for instance:
> nextflow -q run . -entry step13 --input data/input1.txt
WARN: DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
[input1, <...>/work/17/bde087db08bba02e1a2dcce41c8892/output.txt]
Let’s dissect what is going on here…
- We provide the input file
data/input1.txt
as input which gets automatically added to theparams
map asparams.input
. - The content of
input1.txt
is used in the simple sum just as before. - The output
Channel
contains the known triplet but this time the second entry is not a value, but rather a filename.
Please note that the file is output.txt
is automatically stored in the (unique) work
directory. We can take a look inside to verify that the calculation succeeded:
> cat $(nextflow log | cut -f 3 | tail -1 | xargs nextflow log)/output.txt
11
It seems the calculation went well, although one might be surprised by two things:
- The output of the calculation is stored in some randomly generated
work
directory whereas we might want it somewhere more findable. - The
process
itself defines the value of the output filename, which may seem odd… and it is.
Taking our example a bit further and exploiting the fact that parallelism is natively supported by NextFlow as we’ve seen before, we can pass multiple input files to the same workflow defined above.
> nextflow -q run . -entry step13 --input "data/input?.txt"
WARN: DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE
[input3, <...>/work/7b/1d1d756ff5c66dbc56daa6793ec324/output.txt]
[input1, <...>/work/6c/c09325a112f2a749ec10cacd222e47/output.txt]
[input2, <...>/work/0e/7b5a91a27313b0218ac5fcf8f2b09a/output.txt]
Please note that we
- provide the absolute path to the file
- use a wildcard
*
to select multiple files - enclose the path (with wildcard) in double quotes to avoid shell globbing.
In the latter case, we end up with 3 output files, each named output.txt
in their own respective (unique) work
directory.
> nextflow log | cut -f 3 | tail -1 | xargs nextflow log | xargs ls
<...>/work/0e/7b5a91a27313b0218ac5fcf8f2b09a:
input2.txt
output.txt
<...>/work/6c/c09325a112f2a749ec10cacd222e47:
input1.txt
output.txt
<...>/work/7b/1d1d756ff5c66dbc56daa6793ec324:
input3.txt
output.txt