Interoperable single-cell pipelines with anndataR and Viash

Single-cell transcriptomics analysis is split across two language ecosystems. In Python, the scverse ecosystem stores data in memory as an AnnData object and writes it to disk as a HDF5-based H5AD file or a Zarr store. R has two widely used in-memory objects, the SingleCellExperiment object used by the Bioconductor community and the Seurat object. Both play a similar role to AnnData, bundling an expression matrix with metadata and calculated information, but they have different structures and are not interchangeable.

Unlike AnnData, which has a defined on-disk specification, R objects are often written to disk using R’s generic serialisation mechanisms which cannot be natively read by Python. Each ecosystem has real strengths, and best practice is to use tools from both in an analysis but this is made difficult by the challenges of moving data between languages.

Current approaches

Currently, approaches for moving single-cell data between Python and R typically rely on a foreign function interface like reticulate or rpy2. This provides a bridge between languages, allowing compatible data to be transferred between environments. R packages like sceasy and zellkonverter that can read H5AD files do so by using the Python anndata package to perform the read the file, and then convert the contents to R objects by transferring information stored in matrices and data frames to the parallel R environment. The data is stored in a language-agnostic file format but an R user needs to configure a Python environment in order to read it, even if they don’t need Python for their analysis.

How anndataR works

The anndataR package is different: it reads and writes H5AD files natively in R without requiring a Python environment. It can read and write both the H5AD (via the rhdf5 package) and Zarr (using Rarr) formats, adhering to the AnnData on-disk specification. Users can interact with the provided R AnnData object, or convert to a SingleCellExperiment or Seurat object for use with other R packages. When analysis is complete, the native R objects can be written directly to disk as a H5AD file or Zarr store. A robust set of round-trip tests make sure that files written by one language are compatible with the other.

The fundamental differences between the objects makes conversion non-trivial. Each object stores different information in different locations, and AnnData places cells in rows and genes in columns while SingleCellExperiment and Seurat use the opposite orientation. anndataR handles this with sensible defaults that attempt to convert as much data as possible, and lets advanced users control every slot when they need precise control.

Why this matters for Viash pipelines

Viash builds pipelines from language-agnostic modules. A single module can be written in R, Python or Bash, and Viash wraps it into a standard component. A pipeline can mix modules from different languages without needing to know how they are implemented.

That approach relies on being able to pass data between modules. If one module outputs data in a proprietary format that cannot be read by the next one, this presents a significant barrier to building workflows from reusable components.

By using anndataR in R components, a single-cell workflow can take advantage of the strengths of different languages for a comprehensive analysis. The R components no longer need to include a Python environment, making them smaller and easier to maintain. Workflows also become simpler by using one file format and avoiding the need for explicit conversion steps.

anndataR is available under an open source MIT license.

Bioconductor: https://bioconductor.org/packages/anndataR/
GitHub: https://github.com/scverse/anndataR/
Docs: anndatar.scverse.org
Paper: Bioinformatics (2026), https://doi.org/10.1093/bioinformatics/btag288

Current approaches

How anndataR works

Why this matters for Viash pipelines

Start a conversation about your omics data.