Why You Should Consider Running Bioinformatics Tools with Viash: Tool Management Simplified

TL;DR

Viash transforms complex bioinformatics tools into portable, reusable components with automated parameter handling, containerization and workflow integration. You’ll write significantly less boilerplate code while enhancing reliability and reusability.

The Challenge of Installing Bioinformatics Tools

As bioinformaticians, we’ve all experienced the frustration of tool management: spending hours configuring environments, debugging container arguments, or writing yet another wrapper script. While containers have helped with deployment consistency, they come with their own complexity.

Let’s examine this problem through a common, well-supported bioinformatics tool: SAMtools.

The Installation Dilemma

You typically have two choices when installing SAMtools or any other bioinformatics tool.

Option 1: Install from source

# Requires dependency management and compilation 
cd samtools-x.x
./configure --prefix=/where/to/install
make 
make install
export PATH=/where/to/install/bin:$PATH

Option 2: Use a container

# Requires Docker knowledge and correct mount points

docker run -it \
  -v `pwd`:`pwd` \
  -w `pwd` \
  quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 \
  stat test.paired_end.sorted.bam

Drawbacks: - Source installation requires managing dependencies and compilation steps
- Container usage demands Docker expertise and careful volume mounting
- Neither approach scales easily for batch processing
- Neither approach integrates seamlessly with workflow systems

The Band-Aid Solution: Creating Wrapper Scripts

Many bioinformaticians resort to writing wrapper scripts to simplify usage and enable workflows:

#!/bin/bash

docker run -it \
  -v $(pwd):$(pwd) \
  -w $(pwd) \
  quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 \
  stat "$1" > "$2"

This script helps with batch processing but lacks: - Parameter validation - Error handling - Help documentation - Flexible arguments - Logging and progress tracking

A More Complete Script

#!/bin/bash

usage() {
  echo "Usage: $0 -i <input file> -o <output file>" 1>&2
  exit 1
}

while getopts ":i:o:" arg; do
  case "${arg}" in
  i)
    i=${OPTARG}
    ;;
  o)
    o=${OPTARG}
    ;;
  *)
    usage
    ;;
  esac
done
shift $((OPTIND - 1))

if [ -z "${i}" ] || [ -z "${o}" ]; then
  usage
fi

docker run -it \
  -v $(pwd):$(pwd) \
  -w $(pwd) \
  quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1 \
  stat "$i" >"$o"

Even now we’re still missing: - Additional parameters - Robust file and parameter validation - Parallel batch processing - Cross-platform compatibility - Workflow integration

Enter Viash: Bioinformatics Tool Management Made Simple

Viash separates: - What the tool does (functionality) - How it’s used (configuration) - How it’s run

Define Your Script and Configuration

Configuration (src/config.vsh.yaml)

name: samtools_stats

arguments:
  - name: --input
    type: file
    required: true
    must_exist: true
  - name: --output
    type: file
    required: true
    direction: output

resources:
  - type: bash_script
    path: script.sh

engines:
  - type: docker
    image: quay.io/biocontainers/samtools:1.19.2--h50ea8bc_1

runners:
  - type: executable
  - type: nextflow

Script (script.sh)

#!/bin/bash
set -e

samtools stats "$par_input" > "$par_output"

exit 0

Viash is polyglot — your script can be in Python, R, Bash, JavaScript, etc.

Viash Builds Executables and Workflows

Once Viash is installed, build your component with:

viash ns build -q samtools_stats --setup cachedbuild

Viash generates: - A standalone executable (with validation and help) - A Nextflow module (no Groovy knowledge required)

You can: - Test it on the command line - Run batch processes - Integrate into workflows

Running Your Viash Component

Download Sample Data

wget https://github.com/nf-core/test-datasets/raw/modules/data/genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam

Show Help Documentation

target/executable/samtools_stats/samtools_stats --help
# OR
nextflow run target/nextflow/samtools_stats/main.nf -- --help

Run the Component

Executable:

target/executable/samtools_stats/samtools_stats \
  --input test.paired_end.sorted.bam \
  --output samtools_output.txt

Nextflow:

nextflow run target/nextflow/samtools_stats/main.nf \
  --input test.paired_end.sorted.bam \
  --output samtools_output.txt \
  --publish_dir nxf_output

Benefits of Viash

  • Less Code: No need for glue code — boilerplate is auto-generated
  • Validation: Built-in checks for parameters and files
  • Documentation: Auto-generated help with --help
  • Container Handling: No more Docker volume headaches
  • Parallelization: Viash can scale to many samples
  • Workflow Integration: Native Nextflow support (more coming)
  • Cross-Platform: Same component runs locally, on HPC or in the cloud

The Viash Catalogue: 150+ Tools Ready to Use

No need to build everything yourself — the Viash catalogue includes: - STAR, Cellranger - FastQC, MultiQC - Salmon, DESeq2 - scanpy, scvi-tools

Why Use the Catalogue?

  • Save time
  • Use industry-validated tools
  • Consistent interfaces
  • Version-controlled components

What’s Next?

In the next post, we’ll cover parallel processing, automated testing, container management and workflow integration with Viash!

Elevate your data workflows

Transform your data workflows with Data Intuitive’s complete support from start to finish.

Our team can assist with defining requirements, troubleshooting, and maintaining the final product, all while providing end-to-end support.

Contact Us