tiledbsoma.DataFrame.create

classmethod DataFrame.create(uri: str, *, schema: Schema, domain: Sequence[tuple[Any, Any] | list[Any] | None], index_column_names: Sequence[str] = ('soma_joinid',), platform_config: dict[str, Mapping[str, Any]] | object | None = None, context: SOMAContext | SOMATileDBContext | None = None, tiledb_timestamp: int | datetime | None = None) DataFrame

Creates the data structure on disk/S3/cloud.

Parameters:
  • schemaArrow schema defining the per-column schema. This schema must define all columns, including columns to be named as index columns. If the schema includes types unsupported by the SOMA implementation, an error will be raised.

  • index_column_names – A list of column names to use as user-defined index columns (e.g., ['cell_type', 'tissue_type']). All named columns must exist in the schema, and at least one index column name is required.

  • domain – A sequence of tuples, each specifying the range of storable values for an index column. For example, an int64-valued index column, domain=[(100, 200)] indicates values between 100 and 200 (including 100 and 200) can be stored. This sequence’s length must match index_column_names. Leaving the domain as None is deprecated.

  • platform_config – Platform-specific options used to create this array. This may be provided as settings in a dictionary, with options located in the {'tiledb': {'create': ...}} key, or as a TileDBCreateOptions object.

  • tiledb_timestamp – If specified, overrides the default timestamp used to open this object. If unset, uses the timestamp provided by the context.

Returns:

The DataFrame.

Raises:
  • TypeError – If the schema parameter specifies an unsupported type, or if index_column_names specifies a non-indexable column.

  • ValueError – If the index_column_names is malformed or specifies an undefined column name.

  • ValueError – If the schema specifies illegal column names.

  • tiledbsoma.AlreadyExistsError – If the underlying object already exists at the given URI.

  • TileDBError – If unable to create the underlying object.

Examples

>>> schema = pa.schema([("soma_joinid", pa.int64()), ("label", pa.large_string()), ("data", pa.float64())])
>>> with tiledbsoma.DataFrame.create("dataframe1", schema=schema, domain=((0, 10),)) as soma_df:
...     print(soma_df.schema)
soma_joinid: int64 not null
label: large_string
data: double

Lifecycle

Maturing.