Developing and maintaining pipelines/workflows can be a genuine challenge. Doing this in a collaborative context adds even more to this complexity. We all dream of a flexible platform that allows us to easily express the computational requirements and is able to then run those optimally.
Data Intuitive is working in a project where such a pipeline is being developed. The choice of the platform is NextFlow and very early on we decided that we wanted to use DSL2 even if it was very early stages.
It turned out, though, that DSL2 in itself did not yet grant us enough flexibility. We ended up creating a set of conventions for creating modules that enable us to achieve our collaborative development goal.
A DiFlow pipeline is a combination of modules:
- A module contains one step in a larger process
- Each module is independent
- A module can be tested
- A module runs in a dedicated and versioned container
- A module takes a triplet as argument:
[ ID, data, config ]
Please refer to the Github repository for more information and documentation.