At this year’s NextGen omics conference in London, we explored one of the most persistent challenges in bioinformatics: how to transform fast-moving R&D work into workflows that are reproducible, maintainable and ready for large-scale operations.
The talk focused on practical lessons drawn from years of building modular workflows and working across research and IT environments.

We were thrilled to see a full house during our presentation — even on the final afternoon of the conference.
Watch the full re-recorded talk on YouTube
1. The Reality of Modern Omics Data
Omics analysis today is shaped by three compounding forces:
- Volume — data sizes continue to grow across modalities.
- Complexity — diverse formats, metadata, and dependencies make workflows intricate.
- Integration — multimodal analyses multiply all challenges.
Even a conceptual workflow often expands anywhere from 10 to 10s (and even 100+) underlying components, making manual or ad-hoc approaches untenable.
2. R&D vs. Operations: Bridging the Gap
Bioinformaticians innovate quickly — typically using notebooks that support flexible exploration.
But as datasets scale, teams also need:
- reproducibility
- robust execution
- traceability
- operational reliability
A recent example illustrates this gap: A researcher developed an scGPT-based annotation method entirely in a notebook. Scientifically sound but not production-ready. Working closely together, we transformed the notebook into a standalone module and integrated it into the OpenPipeline workflow.
The core lesson:
Innovation needs flexibility; deployment needs structure.
Bridging the two efficiently requires the right tools and a structured approach.
3. Reuse: Turning R&D Into an Asset Instead of Technical Debt
When developing multiple workflows: single-cell, high-throughput, spatial, … we found large overlaps in downstream logic.
Without modular tooling, this leads to:
- duplicated code
- diverging versions
- increased maintenance
- slower evolution
Our solution: maintain a shared toolbox of reusable modules that workflows can depend on directly.
This eliminates duplication, keeps functionality aligned, and accelerates development.
4. Reproducibility Requires More Than Containers
Containers are essential, but they are just the beginning.
Workflows continuously evolve through:
- bug fixes
- new capabilities
- format changes
- interface adjustments
To preserve reproducibility, you must distinguish between:
- Minor updates (bug fixes → version 1.1)
- Major updates (new modules → version 2.0)
When updates break compatibility, we add adapter modules rather than rewriting entire workflows.
Your analysis remains valid only as long as it can be reproduced.
Versioning and compatibility layers are what keep analyses alive over time.
5. Usability Through Adaptability
Different users require different interfaces:
- forms
- CSV files
- automated submissions
- LIMS integrations
- alternative output formats
Instead of creating multiple workflow variants, we wrap a validated pipeline with adaptable interface layers that handle diverse input and output needs.
The underlying workflow stays robust and validated; only the interface changes as user needs evolve
This approach:
- reduces maintenance
- improves adoption
- avoids workflow duplication
- simplifies validation
Seminar Recording
Our presentation was originally delivered at NextGen Omics & Spatial Data 2025.
Due to a technical issue, the conference organisation unfortunately lost the original live recording.
We therefore re-recorded the full presentation so the content remains available to the community.
