Why You Should Consider Running Bioinformatics Tools with Viash: Tool Management Simplified
workflow development, bioinformatics tools, tool management, reproducible workflows, modular workflows, Viash, containerized tools, parameter handling, pipeline automation, batch processing, workflow integration, Nextflow modules, bioinformatics scripting, cross-platform pipelines, dockerized workflows
Part 1:
Why You Should Consider Running Bioinformatics Tools with Viash:
Tool Management Simplified
TL;DR
Viash transforms complex bioinformatics tools into portable, reusable components with automated parameter handling, containerization and workflow integration. You’ll write significantly less boilerplate code while enhancing reliability and reusability.
The Challenge of Installing Bioinformatics Tools
As bioinformaticians, we’ve all experienced the frustration of tool management: spending hours configuring environments, debugging container arguments, or writing yet another wrapper script. While containers have helped with deployment consistency, they come with their own complexity.
Let’s examine this problem through a common, well-supported bioinformatics tool: SAMtools.
The Installation Dilemma
You typically have two choices when installing SAMtools or any other bioinformatics tool.
Option 1: Install from source
# Requires dependency management and compilation
cd samtools-x.x
./configure --prefix=/where/to/install
make
make install
export PATH=/where/to/install/bin:$PATH
Option 2: Use a container
# Requires Docker knowledge and correct mount points
docker run -it \
-v `pwd`:`pwd` \
-w `pwd` \
\
quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 stat test.paired_end.sorted.bam
Both approaches have significant drawbacks:
- Source installation requires managing dependencies and compilation steps
- Container usage demands Docker expertise and careful volume mounting
- Neither approach scales easily for batch processing - analyzing multiple samples requires additional scripting
- Neither approach integrates seamlessly with workflow systems
The Band-Aid Solution: Creating Wrapper Scripts
Many bioinformaticians resort to writing wrapper scripts to simplify usage and enable workflows:
#!/bin/bash
docker run -it \
-v $(pwd):$(pwd) \
-w $(pwd) \
\
quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 "$1" > "$2" stat
This simple script enables batch processing, but quickly becomes inadequate as you need:
- Parameter validation
- Error handling
- Help documentation
- Flexible arguments
- Logging and progress tracking for long runs
When we try to add these features, our simple scripts quickly grow into unwieldy programs.
#!/bin/bash
usage() {
echo "Usage: $0 -i <input file> -o <output file>" 1>&2
exit 1
}
while getopts ":i:o:" arg; do
case "${arg}" in
i)
i=${OPTARG}
;;
o)
o=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND - 1))
if [ -z "${i}" ] || [ -z "${o}" ]; then
usage
fi
docker run -it \
-v $(pwd):$(pwd) \
-w $(pwd) \
\
quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 "$i" >"$o" stat
Even with this expanded script, we’re still missing:
- Support for adding additional parameters
- Robust file and parameter validation
- Efficient batch processing with parallelization
- Cross-platform compatibility
- Seamless integration with workflow systems
Enter Viash: Bioinformatics Tool Management Made Simple
Viash takes a completely different approach by separating:
- What the tool does (functionality)
- How it’s used (configuration)
- How it’s run
Here’s how to create a SAMtools component with Viash.
Under the Hood: Define Your Script and Configuration
Configuration (src/config.vsh.yaml
)
name: samtools_stats
arguments:
- name: --input
type: file
required: true
must_exist: true
- name: --output
type: file
required: true
direction: output
resources:
- type: bash_script
path: script.sh
engines:
- type: docker
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1
runners:
- type: executable
- type: nextflow
Script (script.sh
)
#!/bin/bash
set -e
samtools stats "$par_input" > "$par_output"
exit 0
Note that Viash is polyglot by design - your script can be in any language, including python, R, bash and JavaScript!
Viash Auto-Generates Boilerplate Code
Viash simplifies the process of creating reusable modules from your components. After installing Viash, a single command line instruction does all the heavy lifting
viash ns build -q samtools_stats --setup cachedbuild
Notice how Viash automatically generates both an executable as well as a stand-alone Nextflow workflow:
- A standalone executable that can run directly on your host system or within a container (supporting Docker, Podman, or Singularity), complete with argument validation and help documentation
- A Nextflow module that can be run as a stand-alone workflow or seamlessly integrated into larger Nextflow pipelines - without requiring you to write a single line of Groovy code or have any Nextflow knowledge
tree target
target
├── executable
│ └── samtools_stats
│ └── samtools_stats
└── nextflow
└── samtools_stats
├── main.nf
└── nextflow.config
This approach gives you incredible flexibility. You can use the same component for quick testing on the command line, run it as part of a batch process, or incorporate it into sophisticated workflows—all without writing a single line of additional code or needing expertise in container technologies or workflow languages.
Running Your Viash Component
Download Sample Data
To see your component in action, let’s first download some sample data.
wget https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam
Show Help Documentation
Using the built-in documentation, we can get an easy overview on how to use the component.
target/executable/samtools_stats/samtools_stats --help
# OR
nextflow run target/nextflow/samtools_stats/main.nf -- --help
Then simply run the component with the test data with a single Viash CLI command: We can either run the executable itself:
target/executable/samtools_stats/samtools_stats \
--input test.paired_end.sorted.bam \
--output samtools_output.txt
Or we can run the auto-generated Nextflow workflow:
nextflow run target/nextflow/samtools_stats/main.nf \
--input test.paired_end.sorted.bam \
--output samtools_output.txt \
--publish_dir nxf_output
That’s it! This streamlined approach eliminates boilerplate code and lets you focus on your component’s core functionality. The Viash module automatically handles:
- Robust validation of files and parameters
- Support for additional parameters
- Automated help documentation
- Container management (with Docker, Singularity or Podman)
- Seamless integration with workflow systems
The Benefits: Why This Matters
- Less Code: No need for extensive glue code - boilerplate is auto-generated by Viash
- Automatic Validation: Viash includes built-in parameter checks
- Built-in Documentation: Run your tool with –help for auto-generated docs
- Container Handling: No need to remember Docker mount syntax
- Parallelization: Explore our next blogpost to see how Viash components can process multiple samples in parallel
- Workflow integration: Native support for Nextflow, with more workflow systems to come
- Multiple Environments: The same component works on your local machine, or on HPC or cloud environments.
The Viash Catalogue: 150+ Ready-to-Use Bioinformatics Tools
While creating your own Viash components is straightforward, you probably won’t need to start from scratch. The Viash community has already developed a comprehensive catalogue of the most used bioinformatics tools that are ready for immediate use.
The Viash catalogue contains over 150 commonly used bioinformatics tools, including:
- Alignment tools (STAR, Cellranger, …)
- Quality control (FastQC, MultiQC, …)
- RNA-seq analysis (Salmon, DESeq2, …)
- Single-cell analysis (scanpy, scvi-tools, …)
- And many more
Why Use the Viash Catalogue?
- Time Savings: Skip the component development phase entirely
- Industry-Proven Tools: Trusted and validated by industry leaders and the Viash community
- Consistent Interfaces: All tools follow the same parameter conventions
- Robust versioning: The catalogue maintains consistent version control, ensuring cross-project compatibility
For bioinformaticians who want to focus on analysis rather than tool configuration, the catalogue offers a valuable shortcut.
What’s Next?
Ready to dive deeper? In the next post, we’ll explore more advanced Viash features to handle complex bioinformatics tools, such as parallel processing, automated testing, container management and workflow integration with Viash.