The tiledbsoma module

SOMA powered by TileDB

SOMA — stack of matrices, annotated — is a flexible, extensible, and open-source API enabling access to data in a variety of formats, and is motivated by use cases from single-cell biology. The tiledbsoma Python package is an implementation of SOMA using the TileDB Embedded engine.

Provides

  1. The ability to store, query, and retrieve larger-than-core datasets, resident in both cloud (object-store) and local (file) systems.

  2. A data model supporting dataframes, and both sparse and dense multi-dimensional arrays.

  3. An extended data model with support for single-cell biology data.

See the SOMA GitHub repo for more information on the SOMA project.

Using the documentation

Documentation is also available via the Python builtin help function. We recommend exploring the package. For example:

>>> import tiledbsoma
>>> help(tiledbsoma.DataFrame)

API maturity tags

Classes and functions are annotated with API maturity tags, for example:

Lifecycle: experimental

These tags indicate the maturity of each interface, and are patterned after the RStudio lifecycle stage model. Tags are:

  • experimental: Under active development and may undergo significant and breaking changes.

  • maturing: Under active development but the interface and behavior have stabilized and are unlikely to change significantly but breaking changes are still possible.

  • stable: The interface is considered stable and breaking changes will be avoided where possible. Breaking changes that cannot be avoided will be accompanied by a major version bump.

  • deprecated: The API is no longer recommended for use and may be removed in a future release.

If no tag is present, the state is experimental.

Data types

The principal persistent types provided by SOMA are:

  • Collection – a string-keyed container of SOMA objects.

  • DataFrame – a multi-column table with a user-defined schema, defining the number of columns and their respective column name and value type.

  • SparseNDArray – a sparse multi-dimensional array, storing Arrow primitive data types, i.e., int, float, etc.

  • DenseNDArray – a dense multi-dimensional array, storing Arrow primitive data types, i.e., int, float, etc.

  • Experiment – a specialized Collection, representing an annotated 2-D matrix of measurements.

  • Measurement – a specialized Collection, for use within the Experiment class, representing a set of measurements on a single set of variables (features, e.g., genes)

SOMA Experiment and Measurement are inspired by use cases from single-cell biology.

SOMA uses the Arrow type system and memory model for its in-memory type system and schema. For example, the schema of a DataFrame is expressed as an Arrow Schema.

Error handling

Most errors will be signaled with a raised Exception. Of note:

  • NotImplementedError will be raised when the requested function or method is unsupported.

  • SOMAError is a base class for all SOMA-specific errors.

  • TileDBError will be raised for many TileDB-specific errors.

Most errors will raise an appropriate Python error, e.g., :TypeError or ValueError.

Classes

tiledbsoma.Collection

Collection is a persistent container of named SOMA objects, stored as a mapping of string keys and SOMA object values.

tiledbsoma.Experiment

A collection subtype that combines observations and measurements from an individual experiment.

tiledbsoma.Measurement

A set of observations defined by a dataframe, with measurements.

tiledbsoma.DataFrame

DataFrame is a multi-column table with a user-defined schema.

tiledbsoma.SparseNDArray

SparseNDArray is a sparse, N-dimensional array, with offset (zero-based) integer indexing on each dimension.

tiledbsoma.SparseNDArrayRead

SparseNDArrayRead is an intermediate type which supports multiple eventual result formats

tiledbsoma.DenseNDArray

DenseNDArray is a dense, N-dimensional array, with offset (zero-based) integer indexing on each dimension.

tiledbsoma.ResultOrder

The order results should be returned in.

tiledbsoma.AxisColumnNames

Specifies column names for experiment axis query read operations.

tiledbsoma.AxisQuery

Single-axis dataframe query with coordinates and a value filter.

tiledbsoma.ExperimentAxisQuery

Axis-based query against a SOMA Experiment.

tiledbsoma.SOMATileDBContext

Maintains TileDB-specific context for TileDB-SOMA objects.

Exceptions

tiledbsoma.DoesNotExistError

Raised when attempting to open a non-existent or inaccessible SOMA object.

tiledbsoma.SOMAError

Base error type for SOMA-specific exceptions.

Functions

tiledbsoma.open

Opens a TileDB SOMA object.

tiledbsoma.show_package_versions

Nominal use is for bug reports, so issue filers and issue fixers can be on the same page.

tiledbsoma.get_implementation

Returns the implementation name, e.g., "python-tiledb".

tiledbsoma.get_implementation_version

Returns the package implementation version as a semver.

tiledbsoma.get_storage_engine

Returns underlying storage engine name, e.g., "tiledb".

tiledbsoma.tiledbsoma_stats_disable

Disable TileDB internal statistics.

tiledbsoma.tiledbsoma_stats_dump

Print TileDB internal statistics.

tiledbsoma.tiledbsoma_stats_enable

Enable TileDB internal statistics.

tiledbsoma.tiledbsoma_stats_reset

Reset all TileDB internal statistics to 0.