tiledbsoma.DataFrame.create¶
- classmethod DataFrame.create(uri: str, *, schema: Schema, domain: Sequence[tuple[Any, Any] | list[Any] | None], index_column_names: Sequence[str] = ('soma_joinid',), platform_config: dict[str, Mapping[str, Any]] | object | None = None, context: SOMAContext | SOMATileDBContext | None = None, tiledb_timestamp: int | datetime | None = None) DataFrame¶
Creates the data structure on disk/S3/cloud.
- Parameters:
schema – Arrow schema defining the per-column schema. This schema must define all columns, including columns to be named as index columns. If the schema includes types unsupported by the SOMA implementation, an error will be raised.
index_column_names – A list of column names to use as user-defined index columns (e.g.,
['cell_type', 'tissue_type']). All named columns must exist in the schema, and at least one index column name is required.domain – A sequence of tuples, each specifying the range of storable values for an index column. For example, an int64-valued index column,
domain=[(100, 200)]indicates values between 100 and 200 (including 100 and 200) can be stored. This sequence’s length must match index_column_names. Leaving the domain asNoneis deprecated.platform_config – Platform-specific options used to create this array. This may be provided as settings in a dictionary, with options located in the
{'tiledb': {'create': ...}}key, or as aTileDBCreateOptionsobject.tiledb_timestamp – If specified, overrides the default timestamp used to open this object. If unset, uses the timestamp provided by the context.
- Returns:
The DataFrame.
- Raises:
TypeError – If the
schemaparameter specifies an unsupported type, or ifindex_column_namesspecifies a non-indexable column.ValueError – If the
index_column_namesis malformed or specifies an undefined column name.ValueError – If the
schemaspecifies illegal column names.tiledbsoma.AlreadyExistsError – If the underlying object already exists at the given URI.
TileDBError – If unable to create the underlying object.
Examples
>>> schema = pa.schema([("soma_joinid", pa.int64()), ("label", pa.large_string()), ("data", pa.float64())]) >>> with tiledbsoma.DataFrame.create("dataframe1", schema=schema, domain=((0, 10),)) as soma_df: ... print(soma_df.schema) soma_joinid: int64 not null label: large_string data: double
Lifecycle
Maturing.