Turning R&D Into Scalable, Reproducible Solutions

At this year’s NextGen omics conference in London, we explored one of the most persistent challenges in bioinformatics: how to transform fast-moving R&D work into workflows that are reproducible, maintainable and ready for large-scale operations.
The talk focused on practical lessons drawn from years of building modular workflows and working across research and IT environments.

We were thrilled to see a full house during our presentation - even on the final afternoon of the conference.

Watch the full re-recorded talk on YouTube

Recording

1. The Reality of Modern Omics Data

Omics analysis today is shaped by three compounding forces:

Volume - data sizes continue to grow across modalities.
Complexity - diverse formats, metadata, and dependencies make workflows intricate.
Integration - multimodal analyses multiply all challenges.

Even a conceptual workflow often expands anywhere from 10 to 10s (and even 100+) underlying components, making manual or ad-hoc approaches untenable.

2. R&D vs. Operations: Bridging the Gap

Bioinformaticians innovate quickly - typically using notebooks that support flexible exploration.
But as datasets scale, teams also need:

reproducibility
robust execution
traceability
operational reliability

A recent example illustrates this gap: A researcher developed an scGPT-based annotation method entirely in a notebook. Scientifically sound but not production-ready. Working closely together, we transformed the notebook into a standalone module and integrated it into the OpenPipeline workflow.

The core lesson:

Innovation needs flexibility; deployment needs structure.
Bridging the two efficiently requires the right tools and a structured approach.

3. Reuse: Turning R&D Into an Asset Instead of Technical Debt

When developing multiple workflows: single-cell, high-throughput, spatial, … we found large overlaps in downstream logic.
Without modular tooling, this leads to:

duplicated code
diverging versions
increased maintenance
slower evolution

Our solution: maintain a shared toolbox of reusable modules that workflows can depend on directly.

This eliminates duplication, keeps functionality aligned, and accelerates development.

4. Reproducibility Requires More Than Containers

Containers are essential, but they are just the beginning.

Workflows continuously evolve through:

bug fixes
new capabilities
format changes
interface adjustments

To preserve reproducibility, you must distinguish between:

Minor updates (bug fixes → version 1.1)
Major updates (new modules → version 2.0)

When updates break compatibility, we add adapter modules rather than rewriting entire workflows.

Your analysis remains valid only as long as it can be reproduced.

Versioning and compatibility layers are what keep analyses alive over time.

5. Usability Through Adaptability

Different users require different interfaces:

forms
CSV files
automated submissions
LIMS integrations
alternative output formats

Instead of creating multiple workflow variants, we wrap a validated pipeline with adaptable interface layers that handle diverse input and output needs.

The underlying workflow stays robust and validated; only the interface changes as user needs evolve

This approach:

reduces maintenance
improves adoption
avoids workflow duplication
simplifies validation

Seminar Recording

Our presentation was originally delivered at NextGen Omics & Spatial Data 2025.
Due to a technical issue, the conference organisation unfortunately lost the original live recording.
We therefore re-recorded the full presentation so the content remains available to the community.

Watch the full re-recorded talk on YouTube

Viash Hub

High-Throughput RNA-seq

Watch the full re-recorded talk on YouTube

1. The Reality of Modern Omics Data

2. R&D vs. Operations: Bridging the Gap

3. Reuse: Turning R&D Into an Asset Instead of Technical Debt

4. Reproducibility Requires More Than Containers

5. Usability Through Adaptability

Seminar Recording

Watch the full re-recorded talk on YouTube

Start a conversation about your omics data.