Batteries Included: Supercharging Bioinformatics Modules with Viash
TL;DR
Viash comes with powerful built-in features that would normally require significant additional coding: parallel batch processing for speed, container management for reproducibility, and integrated testing for reliability. These “batteries included” features save you from writing hundreds of lines of boilerplate code.
Integrated Testing
The Testing Challenge in Bioinformatics
Traditional testing involves: - Writing test scripts - Managing test data - Setting up environments - Comparing expected vs. actual outputs
These steps are often skipped, causing fragile tools. Viash makes testing a first-class citizen.
Built-in Testing with Viash
Example test script test.sh
:
#!/bin/bash
echo ">>> Testing $meta_functionality_name"
"$meta_executable" --input "$meta_resources_dir/test.paired_end.sorted.bam" --output "$meta_resources_dir/test.paired_end.sorted.txt"
echo ">>> Checking whether output is non-empty"
[ ! -s "$meta_resources_dir/test.paired_end.sorted.txt" ] && echo "File 'test.paired_end.sorted.txt' is empty!" && exit 1
echo ">>> Checking whether output is correct"
diff <(grep -v "^# The command" "$meta_resources_dir/test.paired_end.sorted.txt") <(grep -v "^# The command" "$meta_resources_dir/ref.paired_end.sorted.txt") || (echo "Output file ref.paired_end.sorted.txt does not match expected output" && exit 1)
rm "$meta_resources_dir/test.paired_end.sorted.txt"
echo ">>> All tests passed successfully."
exit 0
Viash config addition:
name: samtools_stats
arguments:
...
test_resources:
- type: bash_script
path: test.sh
- type: file
path: test.paired_end.sorted.bam
Run tests with:
viash ns test -q samtools_stats
Why Viash Testing Is a Game-Changer
- Reproducibility: Tests run in the same environment as production
- CI/CD Friendly: Easily integrates into pipelines
- Version Control: Test code lives beside the component
Parallel Processing
The Challenge
Bioinformaticians often need to process many samples with: - Resource tracking - Logging - Monitoring
Viash enables built-in batch processing.
The Viash Way: Parameter Lists
Example param_list.yaml
:
- id: sample_1
input: test.paired_end.sorted_1.bam
output: test.paired_end.sorted_1.bam
- id: sample_2
input: test.paired_end.sorted_2.bam
output: test.paired_end.sorted_2.bam
- id: sample_3
input: test.paired_end.sorted_3.bam
output: test.paired_end.sorted_3.bam
Run it with:
nextflow run target/nextflow/samtools_stats/main.nf --param_list param_list.yaml -profile docker -publish-dir test
Why Batch Processing with Viash Rocks
- Efficiency: No need to code parallelization logic
- Flexibility: Sample-specific parameters supported
- Simplicity: Easy YAML file defines all processing
Container Management
The Reproducibility Problem
Containers solve environment drift, but are often: - Hard to configure - Hard to version - Hard to debug
Viash to the Rescue
Viash handles: - Dockerfile generation - Container build + caching - Volume mounting - Lifecycle management
Custom Docker snippet in config:
engines:
- type: docker
image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1
setup:
- type: docker
run: |
samtools --version 2>&1 | grep -E '^(samtools|Using htslib)' | sed 's#Using ##;s# \([0-9\.]*\)$#: \1#' > /var/software_versions.txt
Inspect Dockerfile:
viash run src/config.vsh.yaml ---dockerfile
Debug mode:
viash run src/config.vsh.yaml ---debug
Why Viash Container Management Is a Game-Changer
- No Docker Knowledge Needed
- Consistent Environments Everywhere
- Transparent Versioning and Caching
- Multi-container Tech Support
What’s Next?
In the next post, we’ll show how to combine Viash components into modular workflows, such as RNA-seq pipelines.
Check out the Viash documentation for more.