Cloud-Scale Bioinformatics - Running Viash Workflows on Cloud Platforms

In our previous posts, we explored how to run Viash modules and workflows locally, both on your native host system and with Docker. Now, we’ll discover how we run these same workflows at scale on cloud platforms — zero code changes and zero DevOps knowledge required.

Keywords

cloud bioinformatics, scalable workflows, workflow deployment, Viash cloud execution, Nextflow Tower, Seqera Platform, cloud-native pipelines, bioinformatics on Google Cloud, workflow automation, Viash workflows, cloud-scale data analysis, workflow orchestration tools, containerized bioinformatics, DevOps-free workflows

Part 4: Cloud-Scale Bioinformatics: Running Viash Workflows on Cloud Platforms

TL;DR: In our previous posts, we explored how to run Viash modules and workflows locally, both on your native host system and with Docker. Now, we’ll discover how we run these same workflows at scale on cloud platforms — zero code changes and zero DevOps knowledge required.

From Laptop to Cloud

Traditional approaches to scaling bioinformatics analyses to the cloud involve numerous technical challenges requiring specialized DevOps knowledge and weeks of implementation time, whereas Viash workflows can seamlessly run on cloud platforms without adjusting any code to the workflows themselves.

Let’s explore how to take our Mapping and QC workflow from the previous post in this series and run it on your cloud platform of choice (e.g. Google, AWS or Azure) using Seqera Platform (formerly Nextflow Tower).

As explored in our previous blogpost, we’ll make use of the code repo in https://github.com/viash-hub/playground.

1. Store the data on a cloud bucket

If you haven’t done so already during the tutorial in the previous blogpost, let’s install the test data.

# Generate test data
./test_data.sh

In this tutorial we will move the data to a Google Cloud bucket, but it can be any storage bucket of choice.

# Shuttling the data to Google Cloud
gsutil -m cp -r ./test_data/* gs://bucket-name/test_data/

2. Generate a params file

Now we update our params file to point to the test data in the cloud bucket.

PARAMS_FILE=remote_params.yaml
TEST_DATA_DIR=gs://bucket-name/test_data/
cat >$PARAMS_FILE <<EOF
param_list:
 - id: SRR1569895
   input_r1: $TEST_DATA_DIR/SRR1569895_1_subsample.fastq
   input_r2: $TEST_DATA_DIR/SRR1569895_2_subsample.fastq
 - id: SRR1570800
   input_r1: $TEST_DATA_DIR/SRR1570800_1_subsample.fastq
   input_r2: $TEST_DATA_DIR/SRR1570800_2_subsample.fastq
publish_dir: foo
reference: $TEST_DATA_DIR/S288C_reference_genome_Current_Release_STAR
EOF

3. Optional - optimize resource usage

Thanks to the components-based approach in Viash workflows, we can easily adjust the amount of resources needed to run this workflow at scale!

cat > nextflow.config << HERE
process {
  withName:'.*falco_process' {
    memory = { 200.MB * task.attempt }
  }
  withName:'.*cutadapt_process' {
    memory = { 50.MB * task.attempt }
  }
  withName:'.*star_align_reads_process' {
    memory = { 2.GB * task.attempt }
  }
  withName:'.*samtools_stats_process' {
    memory = { 50.MB * task.attempt }
  }
  withName:'.*multiqc_process' {
    memory = { 200.MB * task.attempt }
  }
}
HERE

4. Launch the Workflow on Seqera Platform

First, export the required keys from your Seqera Cloud set-up. This step requires having configured an adequate compute environment, either on the public Seqera Cloud platform or your own Seqera deployment.

export COMPUTE_ENV=<your_seqera_compute_environment_id>
export WORKSPACE_ID=<your_seqera_workspace_id>

Then we can simply launch the workflow using the tw launch command:

tw launch https://packages.viash-hub.com/vsh/playground \
 --revision main \
 --main-script target/nextflow/mapping_and_qc/main.nf \
 --params-file remote_params.yaml \
 --workspace $WORKSPACE_ID \
 --compute-env $COMPUTE_ENV \
 --config nextflow.config  # optional

Note that we need to have a built workflow stored in a code repository to make use of this tw launch command. Luckily, all tools and workflows of the Viash Catalogue have been pre-built and are stored in the viash-hub repo!

5. Monitor the workflow run

Using the Seqera Cloud web interface, we can now follow the run and monitor its status.

Why This Matters

Moving bioinformatics workflows to the cloud typically presents numerous challenges that require specialized knowledge and significant time investment. However, with Viash workflows, this transition becomes remarkably simple. As demonstrated in this guide, scaling from a laptop to cloud infrastructure requires minimal changes—just pointing to cloud storage locations and utilizing Seqera Platform to manage the execution.

This approach eliminates the need for cloud-specific code adaptations or DevOps expertise, allowing researchers to focus on their scientific questions rather than infrastructure management. The ability to seamlessly scale processing power while maintaining the same workflow structure represents a significant advantage for handling large-scale bioinformatics analyses efficiently.

Wrapping Up the Series

This post concludes our four-part series on Viash tools and workflows. We’ve journeyed from the basics of component creation to running workflows at scale on cloud platforms!

Recap of what we’ve covered:

Creating a simple bioinformatics module
Leveraging “batteries included” features for advanced tool management
Developing Nextflow workflows with a component-based approach
Running workflows at scale with zero code changes

We hope this series has demonstrated how Viash can simplify your bioinformatics pipeline development and deployment process, allowing you to focus on what matters most—your research.

For more information about Viash, please visit our website or documentation page. We’re continuously improving our tools and documentation based on user feedback.

Have questions or want to share how you’re using Viash in your research?
We’d love to hear from you! Reach out to us on LinkedIn or contact us directly at info@data-intuitive.com.

Thank you for following along with this series, and happy coding!

Part 3: Workflow Building