Building Powerful Workflows with Viash Components
bioinformatics workflows, modular workflow design, Viash components, reproducible workflows, workflow automation, pipeline development, workflow best practices, Nextflow integration, component-based pipelines, scalable bioinformatics workflows, workflow versioning, data pipeline testing, workflow maintainability, dockerized workflows, Viash workflow builder
Part 3: Building Powerful Workflows with Viash Components
TL;DR: Viash transforms how you build bioinformatics workflows by providing reusable, tested components that integrate seamlessly into your existing workflows and environment. This approach produces workflows that are more sustainable than traditional methods.
In our previous posts, we covered creating basic Viash components and making use of advanced, “batteries-included” features. Now let’s explore how to combine components into powerful workflows for complex bioinformatics analyses.
The Hidden Cost of Traditional Bioinformatics Workflows
Traditional bioinformatics workflows present numerous challenges that hamper productivity and reliability:
- Complex & redundant glue code - Scientists struggle with writing intricate code just to connect different components together
- Limited modularity - Most workflows can’t have their individual steps tested or run in isolation, or re-used in other workflows
- Monolithic growth - Pipelines organically expand over time into unwieldy systems where troubleshooting becomes nearly impossible
- Painful maintenance - Updating a tool version, swapping methods, or rearranging steps often requires complete workflow overhauls
- Error-prone execution - Without built-in validation between steps, errors silently propagate through the analysis
- Limited resource specification - Computational requirements can’t be easily assigned to specific processes
- Implicit data flow - Input/output relationships remain unclear and error-prone
- Missing documentation - No built-in help for individual processes
- Complex debugging - Troubleshooting requires examining entire workflows
These issues collectively slow down bioinformatics research significantly. Viash workflows address these challenges by fundamentally changing how bioinformatics pipelines are constructed.
Viash: A Component-Based Approach to Workflow Development
Viash represents a paradigm shift in how bioinformatics workflows are constructed. By introducing a component-based approach with automated code generation, Viash addresses the fundamental challenges that plague traditional workflow development.
To follow along with the tutorial in this blogpost, you can clone the repo available at https://github.com/viash-hub/playground.
The Viash Workflow Architecture
Viash workflows combine individual components, both from our open-source catalogue open-source catalogue or custom in-house components. Creating a Viash Nextflow workflow involves two key files:
- A Viash config file (
src/mapping_and_qc/config.vsh.yaml
): declares components, parameters and containerization engines
- A VDSL3 Nextflow script (
src/mapping_and_qc/main.nf
): Orchestrates components into a coherent workflow
Let’s see how this works in a real-world example:
Creating a Mapping and QC Workflow
In the Viash config, the required components are declared as dependencies for our workflow. In this example, all the components are imported directly from the Viash catalogue.
src/mapping_and_qc/config.vsh.yaml
(YAML)
name: mapping_and_qc
description: Run STAR and QC
arguments:
…
dependencies:
- name: cutadapt
repository: bb
- name: falco
repository: bb
- name: multiqc
repository: bb
- name: star/star_align_reads
repository: bb
- name: samtools/samtools_stats
repository: bb
repositories:
- name: bb
type: vsh
repo: vsh/biobox
tag: v0.3
runners:
- type: nextflow
engines:
- type: native
- type: docker
Building and Running the Workflow
When we build the workflow, Viash will auto-generate a Nextflow workflow containing all the required glue code for parameter validation, data handling and container management.
Note how Viash transformed our simple VDSL3 workflow of ~90 lines into a full Nextflow workflow of ~3500 lines! This transformation is based on a deterministic, rule-based system - no AI or LLM involved - ensuring consistent and predictable workflow code following established patterns and best practices.
# Build the workflow
viash ns build -q mapping_and_qc
To run the workflow, let’s first install some test data by executing:
# Generate test data and a parameter file
./test_data.sh
Now we can run our Viash Nextflow workflow, making use of the parameter file.
# Run the workflow
nextflow run . \
-profile docker \
-main-script target/nextflow/mapping_and_qc/main.nf \
-params-file params_file.yaml \
--publish_dir workflow_test
The Viash-based Nextflow workflow comes with built-in documentation as well:
nextflow run target/nextflow/mapping_and_qc/main.nf --help
Key Advantages of Viash Workflows
Viash introduces several key advantages compared to traditional workflow methods that transform how bioinformatics workflows are built and maintained:
Modular, Reusable Components
Viash components are independent, self-contained modules that can be reused across multiple workflows. Each component is version-controlled, tested, and maintained separately, eliminating code duplication and reducing development time.
Explicit Data Flow
The fromState
/toState
pattern creates clear, traceable data connections between components. This explicit data handling reduces hidden errors and makes workflows easier to understand and debug.
Flexible Resource Management
Resource labels can be easily assigned to each component, allowing fine-grained control over computational requirements without complex configuration.
Automatic Input/Output Validation
Components automatically validate their inputs and outputs, catching errors early before they cascade through the analysis pipeline, significantly improving reliability.
Independent Testing
Components can be executed and tested in isolation, simplifying debugging and ensuring reliable operation when combined into larger workflows.
Built-in Documentation
Viash components and workflows come with built-in help and documentation, making it easier for team members to understand and use each component correctly.
Simplified Maintenance
Container versions are defined at the component level, meaning updates can be made to individual components without affecting the overall workflow structure. This dramatically simplifies version management and tool updates.
Currently, Viash supports the generation of Nextflow workflows, allowing you to leverage all the advantages we’ve discussed. Looking ahead, we’re planning to extend support to other popular workflow platforms like Snakemake, further expanding Viash’s flexibility and integration capabilities across the bioinformatics ecosystem.
What’s Next?
In the final post of this series, we’ll explore how to take your Viash workflows to cloud platforms for scalable execution on large datasets.