Cloud-Scale Bioinformatics: Running Viash Workflows on Cloud Platforms
TL;DR
In our previous posts, we explored how to run Viash modules and workflows locally, both on your native host system and with Docker. Now, we’ll discover how we run these same workflows at scale on cloud platforms — zero code changes and zero DevOps knowledge required.
From Laptop to Cloud
Traditional cloud scaling in bioinformatics requires specialized DevOps skills. Viash eliminates these barriers with workflows that run seamlessly across local and cloud environments.
We’ll use the Mapping and QC workflow from the previous post and run it via Seqera Platform (Nextflow Tower).
Repo: https://github.com/viash-hub/playground
Step 1: Store the Data in a Cloud Bucket
Generate test data:
./test_data.sh
Upload to Google Cloud (or your preferred cloud bucket):
gsutil -m cp -r ./test_data/* gs://bucket-name/test_data/
Step 2: Generate a Params File
PARAMS_FILE=remote_params.yaml
TEST_DATA_DIR=gs://bucket-name/test_data/
cat >$PARAMS_FILE <<EOF
param_list:
- id: SRR1569895
input_r1: $TEST_DATA_DIR/SRR1569895_1_subsample.fastq
input_r2: $TEST_DATA_DIR/SRR1569895_2_subsample.fastq
- id: SRR1570800
input_r1: $TEST_DATA_DIR/SRR1570800_1_subsample.fastq
input_r2: $TEST_DATA_DIR/SRR1570800_2_subsample.fastq
publish_dir: foo
reference: $TEST_DATA_DIR/S288C_reference_genome_Current_Release_STAR
EOF
Step 3: (Optional) Optimize Resource Usage
Customize resource settings in nextflow.config
:
cat > nextflow.config << HERE
process {
withName:'.*falco_process' {
memory = { 200.MB * task.attempt }
}
withName:'.*cutadapt_process' {
memory = { 50.MB * task.attempt }
}
withName:'.*star_align_reads_process' {
memory = { 2.GB * task.attempt }
}
withName:'.*samtools_stats_process' {
memory = { 50.MB * task.attempt }
}
withName:'.*multiqc_process' {
memory = { 200.MB * task.attempt }
}
}
HERE
Step 4: Launch the Workflow on Seqera Platform
Export the required credentials:
export COMPUTE_ENV=<your_seqera_compute_environment_id>
export WORKSPACE_ID=<your_seqera_workspace_id>
Launch with tw
:
tw launch https://packages.viash-hub.com/vsh/playground --revision main --main-script target/nextflow/mapping_and_qc/main.nf --params-file remote_params.yaml --workspace $WORKSPACE_ID --compute-env $COMPUTE_ENV --config nextflow.config
Note: All Viash Catalogue workflows are pre-built and hosted in viash-hub
.
Step 5: Monitor Your Workflow
Use the Seqera Cloud UI to monitor job progress and logs.
Why This Matters
Viash + Seqera lets you: - Scale workflows with no code changes - Avoid writing DevOps logic or infrastructure scripts - Leverage cloud resources to analyze large datasets
You keep your focus on the science, not infrastructure.
Wrapping Up the Series
This concludes our four-part blog series: 1. Creating simple Viash modules 2. Leveraging built-in “batteries included” features 3. Building modular workflows 4. Running them at scale on the cloud
We hope this series helps simplify your workflow development and deployment. Learn more at viash.io or get in touch at info@data-intuitive.com.
Thanks for reading, and happy coding! 🚀