tiledbsoma.io.register_h5ads¶
- tiledbsoma.io.register_h5ads(experiment_uri: str | None, h5ad_file_names: Sequence[str] | str, *, measurement_name: str, obs_field_name: str, var_field_name: str, append_obsm_varm: bool = False, context: SOMATileDBContext | None = None, use_multiprocessing: bool = False, allow_duplicate_obs_ids: bool = False) ExperimentAmbientLabelMapping¶
Register H5AD files to extend an existing SOMA
Experiment.This is the required first step before calling
from_h5ad()orfrom_anndata()withappend=True. It inspects all input H5ADs (and the targetExperiment, ifexperiment_uriis supplied) to produce a globalExperimentAmbientLabelMappingthat describes howobs/varidentifiers map to the targetExperiment.- Supported Workflows:
This function and the subsequent append workflow are designed for two primary scenarios:
1. Append new observations from inputs with
obs/varschemas that are consistent with the targetExperiment(i.e., same column names and dtypes). 2. Adding a newMeasurementfor observations that already exist in the targetExperiment.- Schema Evolution:
The append workflow does not automatically evolve the schema of the
obs/varDataFrames in the targetExperiment. If inputs containobs/var``columns not present in the target ``Experimentan error is thrown. If your append operation requires new columns, useupdate_obs()/update_var()before creating the registration map.- Duplicate
obsIDs: By default
obsIDs (fromobs_field_name) across all inputs and the existingExperimentmust be globally unique. If any duplicates are found, aSOMAErroris raised to prevent unintentionally overwriting existing data, which is non-deterministic in multi-writer scenarios. Setallow_duplicate_obs_ids=Trueonly when adding a new ``Measurement`` for an existing set of observations (i.e., no newobsIDs).- New
varIDs: The append workflow automatically handles
varIDs (fromvar_field_name) that do not already exist in the targetExperiment, assuming the input supplies all existing columns with compatible dtypes.- Concurrency:
If enabled via the
use_multiprocessingparameter, this function will use multiprocessing to register each H5AD in parallel. In cases with many files, this can produce a performance benefit. Regardless ofuse_multiprocessing, H5ADs will be registered concurrently – you can control the concurrency using thesoma.compute_concurrency_levelconfiguration parameter in thecontextargument.