Automated Demultiplexing and Processing of High-Throughput Transcriptomics

project
Author

Project Details

Introduction

Data Intuitive recently delivered an end-to-end, fully automated high-throughput RNA-seq workflow for a top-10 pharma R&D client. Built with Viash‑generated, Nextflow‑compatible modules, the system streamlines large‑scale processing and analysis while remaining reproducible and portable across environments.

On top of the workflow, we implemented an automation layer, enabling the entire process to be triggered from a single sample sheet (CSV)-eliminating manual path entry and reducing the need for error-prone parameter forms.

This version of the end-to-end workflow is built on an earlier version that was already using Viash for module generation. This time, besides a much improved interface and automation capabilities, we increased modularity even more by splitting the full end-to-end flow in 3 subworkflows that each can be continually improved.

As a result, the client’s previous implementation was replaced with a more modular and automated solution.

Implementation and Automation

Our team leveraged Viash’s modular approach to build a pipeline tailored to the client’s requirements. Because Viash produces Nextflow-compatible modules and pipelines, we were able to take full advantage of Nextflow’s scalability and cluster execution capabilities.

Moreover, rather than creating a new workflow from scratch or modifying an existing one, with the power of Viash we simply created a wrapper workflow dealing with the changes required for it to function in an automation context. This wrapper workflow is built up from a number of different Viash/workflow packages, some publicly available on Viash Hub (demultiplex, HT-RNAseq) while others are proprietary and available in a private Viash Hub at the client’s side.

To make the workflow truly automated and intuitive, we implemented:

  • Structured CSV sample sheet as the primary interface: all samples and run-specific parameters are captured in one file, with reference resources (genome reference files, lab setup parameters) addressed via labels rather than raw paths and cryptic parameters.

  • Run‑ID based ingestion: users provide a (sequencing) run identifier, and the workflow automatically resolves input files (e.g., on S3 or shared storage).

  • Deterministic data organization: outputs are structured using project and experiment identifiers, standardizing both inputs and outputs and improving traceability. At the same time, data is stored according to the data governance policy in turn ensuring strict adherence to it.

This approach made the system more user-friendly and accessible for both lab scientists and bioinformaticians, while preserving the hallmark strengths of Viash workflows: scalability, robustness, reliability and reproducibility.

For further insights into how we approach user-friendly workflow design, see our related blog post “Re-thinking user-friendly computational workflows”.

Impact

While specific project details remain confidential, the result is a reliable, automated, and user‑friendly workflow that minimizes manual steps, reduces potential failure modes, and accelerates research. The combination of modular engineering and intelligent automation demonstrates how Data Intuitive designs systems that fit seamlessly into real-world scientific operations—powerful technology made practical.

 

Elevate your data workflows

Transform your data workflows with Data Intuitive’s complete support from start to finish.

Our team can assist with defining requirements, troubleshooting, and maintaining the final product, all while providing end-to-end support.

Contact Us