Why You Should Consider Running Bioinformatics Tools with Viash: Tool Management Simplified
TL;DR
Viash transforms complex bioinformatics tools into portable, reusable components with automated parameter handling, containerization and workflow integration. You’ll write significantly less boilerplate code while enhancing reliability and reusability.
The Challenge of Installing Bioinformatics Tools
As bioinformaticians, we’ve all experienced the frustration of tool management: spending hours configuring environments, debugging container arguments, or writing yet another wrapper script. While containers have helped with deployment consistency, they come with their own complexity.
Let’s examine this problem through a common, well-supported bioinformatics tool: SAMtools.
The Installation Dilemma
You typically have two choices when installing SAMtools or any other bioinformatics tool.
Option 1: Install from source
# Requires dependency management and compilation
cd samtools-x.x
./configure --prefix=/where/to/install
make
make install
export PATH=/where/to/install/bin:$PATH
Option 2: Use a container
# Requires Docker knowledge and correct mount points
docker run -it \
-v `pwd`:`pwd` \
-w `pwd` \
\
quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 stat test.paired_end.sorted.bam
Drawbacks: - Source installation requires managing dependencies and compilation steps
- Container usage demands Docker expertise and careful volume mounting
- Neither approach scales easily for batch processing
- Neither approach integrates seamlessly with workflow systems
The Band-Aid Solution: Creating Wrapper Scripts
Many bioinformaticians resort to writing wrapper scripts to simplify usage and enable workflows:
#!/bin/bash
docker run -it \
-v $(pwd):$(pwd) \
-w $(pwd) \
\
quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 "$1" > "$2" stat
This script helps with batch processing but lacks: - Parameter validation - Error handling - Help documentation - Flexible arguments - Logging and progress tracking
A More Complete Script
#!/bin/bash
usage() {
echo "Usage: $0 -i <input file> -o <output file>" 1>&2
exit 1
}
while getopts ":i:o:" arg; do
case "${arg}" in
i)
i=${OPTARG}
;;
o)
o=${OPTARG}
;;
*)
usage
;;
esac
done
shift $((OPTIND - 1))
if [ -z "${i}" ] || [ -z "${o}" ]; then
usage
fi
docker run -it \
-v $(pwd):$(pwd) \
-w $(pwd) \
\
quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 "$i" >"$o" stat
Even now we’re still missing: - Additional parameters - Robust file and parameter validation - Parallel batch processing - Cross-platform compatibility - Workflow integration
Enter Viash: Bioinformatics Tool Management Made Simple
Viash separates: - What the tool does (functionality) - How it’s used (configuration) - How it’s run
Define Your Script and Configuration
Configuration (src/config.vsh.yaml
)
name: samtools_stats
arguments:
- name: --input
type: file
required: true
must_exist: true
- name: --output
type: file
required: true
direction: output
resources:
- type: bash_script
path: script.sh
engines:
- type: docker
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1
runners:
- type: executable
- type: nextflow
Script (script.sh
)
#!/bin/bash
set -e
samtools stats "$par_input" > "$par_output"
exit 0
Viash is polyglot — your script can be in Python, R, Bash, JavaScript, etc.
Viash Builds Executables and Workflows
Once Viash is installed, build your component with:
viash ns build -q samtools_stats --setup cachedbuild
Viash generates: - A standalone executable (with validation and help) - A Nextflow module (no Groovy knowledge required)
You can: - Test it on the command line - Run batch processes - Integrate into workflows
Running Your Viash Component
Download Sample Data
wget https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam
Show Help Documentation
target/executable/samtools_stats/samtools_stats --help
# OR
nextflow run target/nextflow/samtools_stats/main.nf -- --help
Run the Component
Executable:
target/executable/samtools_stats/samtools_stats \
--input test.paired_end.sorted.bam \
--output samtools_output.txt
Nextflow:
nextflow run target/nextflow/samtools_stats/main.nf \
--input test.paired_end.sorted.bam \
--output samtools_output.txt \
--publish_dir nxf_output
Benefits of Viash
- Less Code: No need for glue code — boilerplate is auto-generated
- Validation: Built-in checks for parameters and files
- Documentation: Auto-generated help with
--help
- Container Handling: No more Docker volume headaches
- Parallelization: Viash can scale to many samples
- Workflow Integration: Native Nextflow support (more coming)
- Cross-Platform: Same component runs locally, on HPC or in the cloud
The Viash Catalogue: 150+ Tools Ready to Use
No need to build everything yourself — the Viash catalogue includes: - STAR, Cellranger - FastQC, MultiQC - Salmon, DESeq2 - scanpy, scvi-tools
Why Use the Catalogue?
- Save time
- Use industry-validated tools
- Consistent interfaces
- Version-controlled components
What’s Next?
In the next post, we’ll cover parallel processing, automated testing, container management and workflow integration with Viash!