tiledbsoma.DataFrame¶
- class tiledbsoma.DataFrame(handle: _WrapperType_co | DataFrameWrapper | DenseNDArrayWrapper | SparseNDArrayWrapper, *, _dont_call_this_use_create_or_open_instead: str = 'unset')¶
DataFrameis a multi-column table with a user-defined schema. The schema is expressed as an Arrow Schema, and defines the column names and value types.Every
DataFramemust contain a column calledsoma_joinid, of typeint64, with negative values explicitly disallowed. Thesoma_joinidcolumn contains a unique value for each row in the dataframe, and in some cases (e.g., as part of anExperiment), acts as a join key for other objects, such asSparseNDArray.Lifecycle
Maturing.
Examples
>>> import pyarrow as pa >>> import tiledbsoma >>> schema = pa.schema( ... [ ... ("soma_joinid", pa.int64()), ... ("A", pa.float32()), ... ("B", pa.large_string()), ... ] ... ) >>> with tiledbsoma.DataFrame.create("./test_dataframe", schema=schema) as df: ... data = pa.Table.from_pydict( ... { ... "soma_joinid": [0, 1, 2], ... "A": [1.0, 2.7182, 3.1214], ... "B": ["one", "e", "pi"], ... } ... ) ... df.write(data) >>> with tiledbsoma.DataFrame.open("./test_dataframe") as df: ... print(df.schema) ... print("---") ... print(df.read().concat().to_pandas()) ... soma_joinid: int64 A: float B: large_string --- soma_joinid A B 0 0 1.0000 one 1 1 2.7182 e 2 2 3.1214 pi
>>> import pyarrow as pa >>> import tiledbsoma >>> schema = pa.schema( ... [ ... ("soma_joinid", pa.int64()), ... ("A", pa.float32()), ... ("B", pa.large_string()), ... ] ...) >>> with tiledbsoma.DataFrame.create( ... "./test_dataframe_2", ... schema=schema, ... index_column_names=["A", "B"], ... domain=[(0.0, 10.0), None], ... ) as df: ... data = pa.Table.from_pydict( ... { ... "soma_joinid": [0, 1, 2], ... "A": [1.0, 2.7182, 3.1214], ... "B": ["one", "e", "pi"], ... } ... ) ... df.write(data) >>> with tiledbsoma.DataFrame.open("./test_dataframe_2") as df: ... print(df.schema) ... print("---") ... print(df.read().concat().to_pandas()) soma_joinid: int64 --- A B soma_joinid 0 1.0000 one 0 1 2.7182 e 1 2 3.1214 pi 2
Here the index-column names are specified. The domain is entirely optional: if it’s omitted, defaults will be applied yielding the largest possible domain for each index column’s datatype. If the domain is specified, it must be a tuple/list of equal length to
index_column_names. It can beNonein a given slot, meaning use the largest possible domain. For string/bytes types, it must beNone.- __init__(handle: _WrapperType_co | DataFrameWrapper | DenseNDArrayWrapper | SparseNDArrayWrapper, *, _dont_call_this_use_create_or_open_instead: str = 'unset')¶
Internal-only common initializer steps.
This function is internal; users should open TileDB SOMA objects using the
create()andopen()factory class methods.
Methods
__init__(handle, *[, ...])Internal-only common initializer steps.
close()Release any resources held while the object is open.
create(uri, *, schema[, index_column_names, ...])Creates the data structure on disk/S3/cloud.
exists(uri[, context, tiledb_timestamp])Finds whether an object of this type exists at the given URI.
keys()Returns the names of the columns when read back as a dataframe.
non_empty_domain()Retrieves the non-empty domain for each dimension, namely the smallest and largest indices in each dimension for which the array/dataframe has data occupied.
open(uri[, mode, tiledb_timestamp, context, ...])Opens this specific type of SOMA object.
read([coords, column_names, result_order, ...])Reads a user-defined subset of data, addressed by the dataframe indexing columns, optionally filtered, and return results as one or more Arrow tables.
reopen(mode[, tiledb_timestamp])Return a new copy of the SOMAObject with the given mode at the current Unix timestamp.
verify_open_for_writing()Raises an error if the object is not open for writing.
write(values[, platform_config])Writes an Arrow table to the persistent object.
Attributes
closedTrue if the object has been closed.
contextA value storing implementation-specific configuration information.
countReturns the number of rows in the dataframe.
domainReturns a tuple of minimum and maximum values, inclusive, storable on each index column of the dataframe.
index_column_namesReturns index (dimension) column names.
maxdomainReturns a tuple of minimum and maximum values, inclusive, storable on each index column of the dataframe.
metadataThe metadata of this SOMA object.
modeThe mode this object was opened in, either
rorw.schemaReturns data schema, in the form of an Arrow Schema.
soma_typeA string describing the SOMA type of this object.
tiledb_timestampThe time that this object was opened in UTC.
tiledb_timestamp_msThe time this object was opened, as millis since the Unix epoch.
tiledbsoma_has_upgraded_domainReturns true if the array has the upgraded resizeable domain feature from TileDB-SOMA 1.15: the array was created with this support, or it has had
.tiledbsoma_upgrade_domainapplied to it.uriAccessor for the object's storage URI.