Cloud-Scale Bioinformatics: Running Viash Workflows on Cloud Platforms

TL;DR

In our previous posts, we explored how to run Viash modules and workflows locally, both on your native host system and with Docker. Now, we’ll discover how we run these same workflows at scale on cloud platforms — zero code changes and zero DevOps knowledge required.


From Laptop to Cloud

Traditional cloud scaling in bioinformatics requires specialized DevOps skills. Viash eliminates these barriers with workflows that run seamlessly across local and cloud environments.

We’ll use the Mapping and QC workflow from the previous post and run it via Seqera Platform (Nextflow Tower).

Repo: https://github.com/viash-hub/playground


Step 1: Store the Data in a Cloud Bucket

Generate test data:

./test_data.sh

Upload to Google Cloud (or your preferred cloud bucket):

gsutil -m cp -r ./test_data/* gs://bucket-name/test_data/

Step 2: Generate a Params File

PARAMS_FILE=remote_params.yaml
TEST_DATA_DIR=gs://bucket-name/test_data/
cat >$PARAMS_FILE <<EOF
param_list:
 - id: SRR1569895
   input_r1: $TEST_DATA_DIR/SRR1569895_1_subsample.fastq
   input_r2: $TEST_DATA_DIR/SRR1569895_2_subsample.fastq
 - id: SRR1570800
   input_r1: $TEST_DATA_DIR/SRR1570800_1_subsample.fastq
   input_r2: $TEST_DATA_DIR/SRR1570800_2_subsample.fastq
publish_dir: foo
reference: $TEST_DATA_DIR/S288C_reference_genome_Current_Release_STAR
EOF

Step 3: (Optional) Optimize Resource Usage

Customize resource settings in nextflow.config:

cat > nextflow.config << HERE
process {
  withName:'.*falco_process' {
    memory = { 200.MB * task.attempt }
  }
  withName:'.*cutadapt_process' {
    memory = { 50.MB * task.attempt }
  }
  withName:'.*star_align_reads_process' {
    memory = { 2.GB * task.attempt }
  }
  withName:'.*samtools_stats_process' {
    memory = { 50.MB * task.attempt }
  }
  withName:'.*multiqc_process' {
    memory = { 200.MB * task.attempt }
  }
}
HERE

Step 4: Launch the Workflow on Seqera Platform

Export the required credentials:

export COMPUTE_ENV=<your_seqera_compute_environment_id>
export WORKSPACE_ID=<your_seqera_workspace_id>

Launch with tw:

tw launch https://packages.viash-hub.com/vsh/playground  --revision main  --main-script target/nextflow/mapping_and_qc/main.nf  --params-file remote_params.yaml  --workspace $WORKSPACE_ID  --compute-env $COMPUTE_ENV  --config nextflow.config

Note: All Viash Catalogue workflows are pre-built and hosted in viash-hub.


Step 5: Monitor Your Workflow

Use the Seqera Cloud UI to monitor job progress and logs.


Why This Matters

Viash + Seqera lets you: - Scale workflows with no code changes - Avoid writing DevOps logic or infrastructure scripts - Leverage cloud resources to analyze large datasets

You keep your focus on the science, not infrastructure.


Wrapping Up the Series

This concludes our four-part blog series: 1. Creating simple Viash modules 2. Leveraging built-in “batteries included” features 3. Building modular workflows 4. Running them at scale on the cloud

We hope this series helps simplify your workflow development and deployment. Learn more at viash.io or get in touch at info@data-intuitive.com.

Thanks for reading, and happy coding! 🚀

Elevate your data workflows

Transform your data workflows with Data Intuitive’s complete support from start to finish.

Our team can assist with defining requirements, troubleshooting, and maintaining the final product, all while providing end-to-end support.

Contact Us