tiledbsoma.io.register_h5ads

tiledbsoma.io.register_h5ads(experiment_uri: str | None, h5ad_file_names: Sequence[str] | str, *, measurement_name: str, obs_field_name: str, var_field_name: str, append_obsm_varm: bool = False, context: SOMATileDBContext | None = None, use_multiprocessing: bool = False, allow_duplicate_obs_ids: bool = False) ExperimentAmbientLabelMapping

Register H5AD files to extend an existing SOMA Experiment.

This is the required first step before calling from_h5ad() or from_anndata() with append=True. It inspects all input H5ADs (and the target Experiment, if experiment_uri is supplied) to produce a global ExperimentAmbientLabelMapping that describes how obs/var identifiers map to the target Experiment.

Supported Workflows:

This function and the subsequent append workflow are designed for two primary scenarios:

1. Append new observations from inputs with obs/var schemas that are consistent with the target Experiment (i.e., same column names and dtypes). 2. Adding a new Measurement for observations that already exist in the target Experiment.

Schema Evolution:

The append workflow does not automatically evolve the schema of the obs/var DataFrames in the target Experiment. If inputs contain obs/var``columns not present in the target ``Experiment an error is thrown. If your append operation requires new columns, use update_obs()/update_var() before creating the registration map.

Duplicate obs IDs:

By default obs IDs (from obs_field_name) across all inputs and the existing Experiment must be globally unique. If any duplicates are found, a SOMAError is raised to prevent unintentionally overwriting existing data, which is non-deterministic in multi-writer scenarios. Set allow_duplicate_obs_ids=True only when adding a new ``Measurement`` for an existing set of observations (i.e., no new obs IDs).

New var IDs:

The append workflow automatically handles var IDs (from var_field_name) that do not already exist in the target Experiment, assuming the input supplies all existing columns with compatible dtypes.

Concurrency:

If enabled via the use_multiprocessing parameter, this function will use multiprocessing to register each H5AD in parallel. In cases with many files, this can produce a performance benefit. Regardless of use_multiprocessing, H5ADs will be registered concurrently – you can control the concurrency using the soma.compute_concurrency_level configuration parameter in the context argument.