Viash Hub CLI demo

Author

See demo

Introduction

In what follows, we will demonstrate how Viash and later Viash Hub allow anyone with a minimal set of technical skills to develop and perform a simple task: Run QC on a (potentially large) set of fastq files and combine all those QC reports into one (multiqc) report.

A bioinformatician could use fastqc in combination with a (bash) shell for loop. This, however, would not be run in parallel. Command-line tools exists to parallelize these tasks, but the ones we know of can hardly be called easy to use.

What if one could just reuse existing functionality (aka Viash components or Nextflow modules) and combine those in a simple Nextflow pipeline in order to achieve the mentioned goal. That’s where this demo project comes in: https://viash-hub.com/data-intuitive/viash_hub_demo/-/tree/v0.1?ref_type=heads.

Below, we run this pipeline on a test dataset in two ways. Screencasts are provided to demonstrate the use.

Test data

We will fetch test data from this repository: https://github.com/hartwigmedical/testdata:

git clone https://github.com/hartwigmedical/testdata testData 

Run directly from ViashHub

In order to fetch the workflow from Viash Hub, the following should be added to ~/.nextflow/scm:

providers {
  vsh {
    platform = 'gitlab'
    server = "viash-hub.com"
  }
}

Then, with the data fetched above present under testData, we can run fastqc in parallel on all 32 fastq files:

nextflow run data-intuitive/viash_hub_demo \
    -hub vsh \
    -main-script target/nextflow/workflows/parallel_qc/main.nf \
    -r main \
    --input "testData/**/*.fastq.gz" \
    --publish_dir output \
    -with-docker

The output will be stored under output as indicated by the --publish_dir argument.

Screencast of fetching test data and running the pipeline from Viash Hub directly:

Run from a local copy

First of all, build the workflow component and fetch the dependencies:

 viash ns build
temporaryFolder: /tmp/viash_hub_repo5484030342718552259 uri: https://github.com/openpipelines-bio/openpipeline.git
Cloning into '.'...
checkout out: List(git, checkout, tags/0.12.1, --, .) 0
Creating temporary 'target/.build.yaml' file for op as this file seems to be missing.
Exporting parallel_qc (workflows) =nextflow=> <...>/demo/target/nextflow/workflows/parallel_qc
Exporting transpose (utils) =nextflow=> <...>/demo/target/nextflow/utils/transpose
All 2 configs built successfully

Now, run fastqc on all fastq files that can be found under in the testData directory:

 nextflow run target/nextflow/workflows/parallel_qc/main.nf \
    --input "testData/**/*.fastq.gz"  \
    --publish_dir output \
    -with-docker

Screencast of fetching test data and running the pipeline from a local copy:

Done!

Elevate your data workflows

Transform your data workflows with Data Intuitive’s complete support from start to finish.

Our team can assist with defining requirements, troubleshooting, and maintaining the final product, all while providing end-to-end support.

book a meeting with us