tiledbsoma.DataFrame¶
- class tiledbsoma.DataFrame(handle: _WrapperType_co | DataFrameWrapper | DenseNDArrayWrapper | SparseNDArrayWrapper, *, _dont_call_this_use_create_or_open_instead: str = 'unset')¶
DataFrame
is a multi-column table with a user-defined schema. The schema is expressed as an Arrow Schema, and defines the column names and value types.Every
DataFrame
must contain a column calledsoma_joinid
, of typeint64
, with negative values explicitly disallowed. Thesoma_joinid
column contains a unique value for each row in the dataframe, and in some cases (e.g., as part of anExperiment
), acts as a join key for other objects, such asSparseNDArray
.Lifecycle
Experimental.
Examples
>>> import pyarrow as pa >>> import tiledbsoma >>> schema = pa.schema( ... [ ... ("soma_joinid", pa.int64()), ... ("A", pa.float32()), ... ("B", pa.large_string()), ... ] ... ) >>> with tiledbsoma.DataFrame.create("./test_dataframe", schema=schema) as df: ... data = pa.Table.from_pydict( ... { ... "soma_joinid": [0, 1, 2], ... "A": [1.0, 2.7182, 3.1214], ... "B": ["one", "e", "pi"], ... } ... ) ... df.write(data) >>> with tiledbsoma.DataFrame.open("./test_dataframe") as df: ... print(df.schema) ... print("---") ... print(df.read().concat().to_pandas()) ... soma_joinid: int64 A: float B: large_string --- soma_joinid A B 0 0 1.0000 one 1 1 2.7182 e 2 2 3.1214 pi
>>> import pyarrow as pa >>> import tiledbsoma >>> schema = pa.schema( ... [ ... ("soma_joinid", pa.int64()), ... ("A", pa.float32()), ... ("B", pa.large_string()), ... ] ...) >>> with tiledbsoma.DataFrame.create( ... "./test_dataframe_2", ... schema=schema, ... index_column_names=["A", "B"], ... domain=[(0.0, 10.0), None], ... ) as df: ... data = pa.Table.from_pydict( ... { ... "soma_joinid": [0, 1, 2], ... "A": [1.0, 2.7182, 3.1214], ... "B": ["one", "e", "pi"], ... } ... ) ... df.write(data) >>> with tiledbsoma.DataFrame.open("./test_dataframe_2") as df: ... print(df.schema) ... print("---") ... print(df.read().concat().to_pandas()) soma_joinid: int64 --- A B soma_joinid 0 1.0000 one 0 1 2.7182 e 1 2 3.1214 pi 2
Here the index-column names are specified. The domain is entirely optional: if it’s omitted, defaults will be applied yielding the largest possible domain for each index column’s datatype. If the domain is specified, it must be a tuple/list of equal length to
index_column_names
. It can beNone
in a given slot, meaning use the largest possible domain. For string/bytes types, it must beNone
.- __init__(handle: _WrapperType_co | DataFrameWrapper | DenseNDArrayWrapper | SparseNDArrayWrapper, *, _dont_call_this_use_create_or_open_instead: str = 'unset')¶
Internal-only common initializer steps.
This function is internal; users should open TileDB SOMA objects using the
create()
andopen()
factory class methods.
Methods
__init__
(handle, *[, ...])Internal-only common initializer steps.
close
()Release any resources held while the object is open.
create
(uri, *, schema[, index_column_names, ...])Creates the data structure on disk/S3/cloud.
exists
(uri[, context, tiledb_timestamp])Finds whether an object of this type exists at the given URI.
keys
()Returns the names of the columns when read back as a dataframe.
non_empty_domain
()Retrieves the non-empty domain for each dimension, namely the smallest and largest indices in each dimension for which the array/dataframe has data occupied.
open
(uri[, mode, tiledb_timestamp, context, ...])Opens this specific type of SOMA object.
read
([coords, column_names, result_order, ...])Reads a user-defined subset of data, addressed by the dataframe indexing columns, optionally filtered, and return results as one or more Arrow tables.
verify_open_for_writing
()Raises an error if the object is not open for writing.
write
(values[, platform_config])Writes an Arrow table to the persistent object.
Attributes
closed
True if the object has been closed.
context
A value storing implementation-specific configuration information.
count
Returns the number of rows in the dataframe.
domain
Returns a tuple of minimum and maximum values, inclusive, storable on each index column of the dataframe.
index_column_names
Returns index (dimension) column names.
metadata
The metadata of this SOMA object.
mode
The mode this object was opened in, either
r
orw
.schema
Returns data schema, in the form of an Arrow Schema.
soma_type
A string describing the SOMA type of this object.
tiledb_timestamp
The time that this object was opened in UTC.
tiledb_timestamp_ms
The time this object was opened, as millis since the Unix epoch.
uri
Accessor for the object's storage URI.