TileDB-SOMA Python API Reference

SOMA is a unified data model and API for single-cell data.

If you know about obs, var, and X, you’ll recognize what you’re seeing.

The data model and API — here as implemented using the TileDB storage engine — allow you to persist, investigate, and share annotated 2D matrices, commonly used in single-cell biology.

Features:

  • Flexible, extensible, and open-source API

  • Supports access to persistent, cloud-resident annotated 2D matrix datasets

  • Enables use within popular data science environments (e.g., R, Python), using the tools of that environment (e.g., Python Pandas integration), with the same storage regardless of language

  • Allows interoperability with multiple tools including AnnData, Scanpy, Seurat, and Bioconductor

  • Cloud-native TileDB arrays allow you to slice straight from remote storage

  • Reduces costs and processing time by utilizing cost-efficient object storage services like S3

  • Enables out-of-core access to data aggregations much larger than single-host main memory

  • Enables distributed computation over datasets