{ "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.9.15", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" } }, "nbformat_minor": 5, "nbformat": 4, "cells": [ { "cell_type": "markdown", "source": "# Tutorial: SOMA Experiment queries", "metadata": { "tags": [] }, "id": "2b8e72a7-129c-422c-b955-350fb9ee0541" }, { "cell_type": "code", "source": "import tiledbsoma as soma", "metadata": { "tags": [], "trusted": true }, "execution_count": 3, "outputs": [], "id": "3a5fd5d3" }, { "cell_type": "markdown", "source": "In this notebook, we'll take a quick look at the SOMA experiment-query API. The dataset used is from Peripheral Blood Mononuclear Cells (PBMC), which is freely available from 10X Genomics.\n", "metadata": { "tags": [] }, "id": "ccc8709a" }, { "cell_type": "code", "source": "exp = soma.Experiment.open('data/sparse/pbmc3k')", "metadata": { "tags": [], "trusted": true }, "execution_count": 4, "outputs": [], "id": "9b8851d9-27f1-437b-a070-b41a65a5609e" }, { "cell_type": "markdown", "source": "Using the keys of the `obs` dataframe, we can see what fields are available to query on.", "metadata": { "tags": [] }, "id": "fab7898c" }, { "cell_type": "code", "source": "exp.obs.keys()", "metadata": { "tags": [], "trusted": true }, "execution_count": 5, "outputs": [ { "execution_count": 5, "output_type": "execute_result", "data": { "text/plain": "('soma_joinid', 'obs_id', 'n_genes', 'percent_mito', 'n_counts', 'louvain')" }, "metadata": {} } ], "id": "d67dfbc6-0382-4acc-8c56-3670549654f8" }, { "cell_type": "code", "source": "p = exp.obs.read(column_names=['louvain']).concat().to_pandas()\np", "metadata": { "tags": [], "trusted": true }, "execution_count": 6, "outputs": [ { "execution_count": 6, "output_type": "execute_result", "data": { "text/plain": " louvain\n0 CD4 T cells\n1 B cells\n2 CD4 T cells\n3 CD14+ Monocytes\n4 NK cells\n... ...\n2633 CD14+ Monocytes\n2634 B cells\n2635 B cells\n2636 B cells\n2637 CD4 T cells\n\n[2638 rows x 1 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
louvain
0CD4 T cells
1B cells
2CD4 T cells
3CD14+ Monocytes
4NK cells
......
2633CD14+ Monocytes
2634B cells
2635B cells
2636B cells
2637CD4 T cells
\n

2638 rows × 1 columns

\n
" }, "metadata": {} } ], "id": "9e4ede09-2303-4c21-92c1-bf42ed4e7dd1" }, { "cell_type": "markdown", "source": "Focusing on the `louvain` column, we can now find out what column values are present in the data.", "metadata": { "tags": [] }, "id": "f305fb7c" }, { "cell_type": "code", "source": "p.groupby('louvain').size().sort_values()", "metadata": { "tags": [], "trusted": true }, "execution_count": 7, "outputs": [ { "execution_count": 7, "output_type": "execute_result", "data": { "text/plain": "louvain\nMegakaryocytes 15\nDendritic cells 37\nFCGR3A+ Monocytes 150\nNK cells 154\nCD8 T cells 316\nB cells 342\nCD14+ Monocytes 480\nCD4 T cells 1144\ndtype: int64" }, "metadata": {} } ], "id": "00f1ccad-3ee2-4947-8961-8bf9642fbbba" }, { "cell_type": "markdown", "source": "Now we can query the SOMA experiment -- here, by a few cell types.", "metadata": { "tags": [] }, "id": "fda99535" }, { "cell_type": "code", "source": "obs_query = soma.AxisQuery(value_filter='louvain in [\"B cells\", \"NK cells\"]')", "metadata": { "tags": [], "trusted": true }, "execution_count": 8, "outputs": [], "id": "e2ed76ca-5821-44c5-a220-ff96568686ec" }, { "cell_type": "code", "source": "query = exp.axis_query(\"RNA\", obs_query=obs_query)", "metadata": { "tags": [], "trusted": true }, "execution_count": 9, "outputs": [], "id": "f3af70bc-3817-453c-a18c-56dc9aa874da" }, { "cell_type": "markdown", "source": "Note that the query output is smaller than the original dataset's size -- since we've queried for only a particular pair of cell types.", "metadata": { "tags": [] }, "id": "fb94d898" }, { "cell_type": "code", "source": "[exp.obs.count, exp.ms[\"RNA\"].var.count]", "metadata": { "tags": [], "trusted": true }, "execution_count": 10, "outputs": [ { "execution_count": 10, "output_type": "execute_result", "data": { "text/plain": "[2638, 1838]" }, "metadata": {} } ], "id": "2c60568b-0789-4dbf-aff9-4bea2860aef4" }, { "cell_type": "code", "source": "[query.n_obs, query.n_vars]", "metadata": { "tags": [], "trusted": true }, "execution_count": 11, "outputs": [ { "execution_count": 11, "output_type": "execute_result", "data": { "text/plain": "[496, 1838]" }, "metadata": {} } ], "id": "28ed8d40-36c5-4642-bd8f-53d35c3074f0" }, { "cell_type": "markdown", "source": "Here we can take a look at the X data.", "metadata": { "tags": [] }, "id": "c9625771" }, { "cell_type": "code", "source": "query.X(\"data\").tables().concat().to_pandas()", "metadata": { "tags": [], "trusted": true }, "execution_count": 12, "outputs": [ { "execution_count": 12, "output_type": "execute_result", "data": { "text/plain": " soma_dim_0 soma_dim_1 soma_data\n0 1 0 -0.214582\n1 1 1 -0.372653\n2 1 2 -0.054804\n3 1 3 -0.683391\n4 1 4 0.633951\n... ... ... ...\n911643 2636 1833 -0.149789\n911644 2636 1834 -0.325824\n911645 2636 1835 -0.005918\n911646 2636 1836 -0.135213\n911647 2636 1837 -0.482111\n\n[911648 rows x 3 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
soma_dim_0soma_dim_1soma_data
010-0.214582
111-0.372653
212-0.054804
313-0.683391
4140.633951
............
91164326361833-0.149789
91164426361834-0.325824
91164526361835-0.005918
91164626361836-0.135213
91164726361837-0.482111
\n

911648 rows × 3 columns

\n
" }, "metadata": {} } ], "id": "65063167-5015-497a-9712-d72c0ecac2ed" }, { "cell_type": "markdown", "source": "To finish out this introductory look at the experiment-query API, we can convert our query outputs to AnnData format.", "metadata": { "tags": [] }, "id": "db7af8b8" }, { "cell_type": "code", "source": "adata = query.to_anndata(X_name=\"data\")", "metadata": { "tags": [], "trusted": true }, "execution_count": 13, "outputs": [], "id": "1ed8510b-343a-4f88-8aae-11a5c2069311" }, { "cell_type": "code", "source": "adata", "metadata": { "tags": [], "trusted": true }, "execution_count": 14, "outputs": [ { "execution_count": 14, "output_type": "execute_result", "data": { "text/plain": "AnnData object with n_obs × n_vars = 496 × 1838\n obs: 'soma_joinid', 'obs_id', 'n_genes', 'percent_mito', 'n_counts', 'louvain'\n var: 'soma_joinid', 'var_id', 'n_cells'" }, "metadata": {} } ], "id": "b3118504-8c92-48d4-9b83-87176960e4f1" }, { "cell_type": "code", "source": "", "metadata": { "trusted": true }, "execution_count": null, "outputs": [], "id": "f2e46ce1-cf7a-43c2-9f0d-bf918fd806bc" } ] }