{ "cells": [ { "cell_type": "markdown", "id": "be5d3b4f", "metadata": {}, "source": [ "# Datasets\n", "\n", "Datasets consist of complete combinatorial landscapes to visualize and work with, as well as the data from which the were inferred. Both inference of the complete landscape and calculation of the coordinates of the visualization are precomputed to provide rapid access to the different layers of interest." ] }, { "cell_type": "code", "execution_count": 1, "id": "6b281ef0", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "import numpy as np\n", "\n", "from gpmap.datasets import DataSet, list_available_datasets\n", "from gpmap.inference import VCregression" ] }, { "cell_type": "markdown", "id": "c419153e", "metadata": {}, "source": [ "## How to load a built-in dataset\n", "\n", "We include a series of datasets that are used throughout the documentation for demonstration of the different applications and are directly accessible after installation of the library for any user.\n", "The list of built-in datasets can be easily shown as follows" ] }, { "cell_type": "code", "execution_count": 2, "id": "a40a04a0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['5ss', 'f1u', 'test', 'dmsc', 'gb1', 'smn1', 'serine', 'trna', 'pard']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list_available_datasets()" ] }, { "cell_type": "markdown", "id": "508e43b7", "metadata": {}, "source": [ "### How to access combinatorial landscape values\n", "\n", "And one can easily load one of those datasets as illustrated in some previous tutorials, and all of them should contain at least a `landscape` attribute containing the phenotype associated to each possible genotype" ] }, { "cell_type": "code", "execution_count": 3, "id": "b5b44e02", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
y
seq
AAAA0.296301
AAAC-2.713474
AAAD-2.912992
AAAE-4.548719
AAAF-3.276738
......
YYYS-4.662925
YYYT-3.223102
YYYV-3.001718
YYYW-4.723318
YYYY-4.876429
\n", "

160000 rows × 1 columns

\n", "
" ], "text/plain": [ " y\n", "seq \n", "AAAA 0.296301\n", "AAAC -2.713474\n", "AAAD -2.912992\n", "AAAE -4.548719\n", "AAAF -3.276738\n", "... ...\n", "YYYS -4.662925\n", "YYYT -3.223102\n", "YYYV -3.001718\n", "YYYW -4.723318\n", "YYYY -4.876429\n", "\n", "[160000 rows x 1 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb1 = DataSet('gb1')\n", "gb1.landscape" ] }, { "cell_type": "markdown", "id": "2ed7ddd0", "metadata": {}, "source": [ "### How to access the processed data in experimental datasets\n", "\n", "If the landscape was obtained from experimental data, then it also has a `data` attribute that includes the measurement `y` and, if available, its uncertainty `y_var`. \n", "The data may not necessarily include measurements for every possible sequence, as in this case, in which about ~10000 sequences were not experimentally measured" ] }, { "cell_type": "code", "execution_count": 4, "id": "5ef03158", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yy_var
sequence
AAAA0.4608310.046009
AAAG-2.1922610.255906
AAAH-4.7283062.064530
AAAI-4.3388422.095252
AAAL-2.3262400.087518
.........
YYYS-5.2699870.291090
YYYT-3.8214260.074489
YYYV-3.1435360.074682
YYYW-4.3065810.699467
YYYY-4.4298130.417405
\n", "

149361 rows × 2 columns

\n", "
" ], "text/plain": [ " y y_var\n", "sequence \n", "AAAA 0.460831 0.046009\n", "AAAG -2.192261 0.255906\n", "AAAH -4.728306 2.064530\n", "AAAI -4.338842 2.095252\n", "AAAL -2.326240 0.087518\n", "... ... ...\n", "YYYS -5.269987 0.291090\n", "YYYT -3.821426 0.074489\n", "YYYV -3.143536 0.074682\n", "YYYW -4.306581 0.699467\n", "YYYY -4.429813 0.417405\n", "\n", "[149361 rows x 2 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb1.data" ] }, { "cell_type": "markdown", "id": "47fdfd9a", "metadata": {}, "source": [ "### How to access the a dataset visualization\n", "\n", "For built-in datasets, we also provide the pre-calculated coordinates of the visualization, the `DataFrame` connecting sequences separated by single point mutations and the relaxation times associated to each of the diffusion axes in the attributes `nodes`, `edges` and `relaxation_times`" ] }, { "cell_type": "code", "execution_count": 5, "id": "f2bfbe60", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
12345678910functionstationary_freq
AAAA-0.270938-0.944304-0.2271710.7448030.059077-0.077512-0.4778530.1744910.0159440.0526640.2963011.067767e-04
AAAC0.033789-0.232603-0.2714580.5764870.0356190.0876080.590118-0.249005-0.087750-0.110291-2.7134744.954648e-06
AAAD-0.020398-0.127749-0.1744550.3478430.1426840.2086790.5900250.1608190.3543970.676487-2.9129924.042194e-06
AAAE-0.001018-0.138712-0.1831610.3407280.1210670.1578710.4364070.1956300.2111000.364298-4.5487197.619345e-07
AAAF0.149717-0.156524-0.2393040.3862430.1032850.1077560.3024060.0515750.1712780.226772-3.2767382.789084e-06
.......................................
YYYS0.0738800.038075-0.0977510.1561840.0564630.0742910.2625120.0190370.1444390.172686-4.6629256.781399e-07
YYYT-0.0911250.2133700.2564030.246274-0.0862790.0269230.2170860.1021110.5932570.003682-3.2231022.945947e-06
YYYV0.0164880.1952420.2163200.0352690.3067260.0253340.217759-0.028542-0.0383780.148356-3.0017183.692393e-06
YYYW0.1340720.114107-0.0438560.0110920.0765650.1099070.2612740.1089090.1803710.365348-4.7233186.376209e-07
YYYY0.0862780.113188-0.0350620.0175970.0966700.2495980.2611070.1086950.1883310.368750-4.8764295.454157e-07
\n", "

160000 rows × 12 columns

\n", "
" ], "text/plain": [ " 1 2 3 4 5 6 7 \\\n", "AAAA -0.270938 -0.944304 -0.227171 0.744803 0.059077 -0.077512 -0.477853 \n", "AAAC 0.033789 -0.232603 -0.271458 0.576487 0.035619 0.087608 0.590118 \n", "AAAD -0.020398 -0.127749 -0.174455 0.347843 0.142684 0.208679 0.590025 \n", "AAAE -0.001018 -0.138712 -0.183161 0.340728 0.121067 0.157871 0.436407 \n", "AAAF 0.149717 -0.156524 -0.239304 0.386243 0.103285 0.107756 0.302406 \n", "... ... ... ... ... ... ... ... \n", "YYYS 0.073880 0.038075 -0.097751 0.156184 0.056463 0.074291 0.262512 \n", "YYYT -0.091125 0.213370 0.256403 0.246274 -0.086279 0.026923 0.217086 \n", "YYYV 0.016488 0.195242 0.216320 0.035269 0.306726 0.025334 0.217759 \n", "YYYW 0.134072 0.114107 -0.043856 0.011092 0.076565 0.109907 0.261274 \n", "YYYY 0.086278 0.113188 -0.035062 0.017597 0.096670 0.249598 0.261107 \n", "\n", " 8 9 10 function stationary_freq \n", "AAAA 0.174491 0.015944 0.052664 0.296301 1.067767e-04 \n", "AAAC -0.249005 -0.087750 -0.110291 -2.713474 4.954648e-06 \n", "AAAD 0.160819 0.354397 0.676487 -2.912992 4.042194e-06 \n", "AAAE 0.195630 0.211100 0.364298 -4.548719 7.619345e-07 \n", "AAAF 0.051575 0.171278 0.226772 -3.276738 2.789084e-06 \n", "... ... ... ... ... ... \n", "YYYS 0.019037 0.144439 0.172686 -4.662925 6.781399e-07 \n", "YYYT 0.102111 0.593257 0.003682 -3.223102 2.945947e-06 \n", "YYYV -0.028542 -0.038378 0.148356 -3.001718 3.692393e-06 \n", "YYYW 0.108909 0.180371 0.365348 -4.723318 6.376209e-07 \n", "YYYY 0.108695 0.188331 0.368750 -4.876429 5.454157e-07 \n", "\n", "[160000 rows x 12 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb1.nodes" ] }, { "cell_type": "code", "execution_count": 6, "id": "41b00059", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ij
001
102
203
304
405
.........
6079995159996159998
6079996159996159999
6079997159997159998
6079998159997159999
6079999159998159999
\n", "

6080000 rows × 2 columns

\n", "
" ], "text/plain": [ " i j\n", "0 0 1\n", "1 0 2\n", "2 0 3\n", "3 0 4\n", "4 0 5\n", "... ... ...\n", "6079995 159996 159998\n", "6079996 159996 159999\n", "6079997 159997 159998\n", "6079998 159997 159999\n", "6079999 159998 159999\n", "\n", "[6080000 rows x 2 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb1.edges" ] }, { "cell_type": "code", "execution_count": 7, "id": "1b0ecac7", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
kdecay_ratesrelaxation_time
012.5548430.391413
123.5668620.280359
234.9265680.202981
345.0236570.199058
455.3030260.188572
565.6355940.177444
676.2948680.158860
786.5435880.152821
896.7416850.148331
9107.0007980.142841
\n", "
" ], "text/plain": [ " k decay_rates relaxation_time\n", "0 1 2.554843 0.391413\n", "1 2 3.566862 0.280359\n", "2 3 4.926568 0.202981\n", "3 4 5.023657 0.199058\n", "4 5 5.303026 0.188572\n", "5 6 5.635594 0.177444\n", "6 7 6.294868 0.158860\n", "7 8 6.543588 0.152821\n", "8 9 6.741685 0.148331\n", "9 10 7.000798 0.142841" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gb1.relaxation_times" ] }, { "cell_type": "markdown", "id": "81525905", "metadata": {}, "source": [ "## How to build new datasets\n", "\n", "We also provide functionality to create new datasets and store them in the local copy of your library for easier access. Lets build new datasets from simulated data" ] }, { "cell_type": "code", "execution_count": 8, "id": "178e64ba", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
yy_var
AAAAA0.0395400.01
AAAAG-0.1178620.01
AAAAT0.3032570.01
AAACA0.2305500.01
AAACC-0.0003830.01
.........
TTTCT-0.0109060.01
TTTGG-0.3981180.01
TTTTC-0.3207790.01
TTTTG-0.2664560.01
TTTTT-0.1410160.01
\n", "

814 rows × 2 columns

\n", "
" ], "text/plain": [ " y y_var\n", "AAAAA 0.039540 0.01\n", "AAAAG -0.117862 0.01\n", "AAAAT 0.303257 0.01\n", "AAACA 0.230550 0.01\n", "AAACC -0.000383 0.01\n", "... ... ...\n", "TTTCT -0.010906 0.01\n", "TTTGG -0.398118 0.01\n", "TTTTC -0.320779 0.01\n", "TTTTG -0.266456 0.01\n", "TTTTT -0.141016 0.01\n", "\n", "[814 rows x 2 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.seed(0)\n", "lambdas = np.array([10, 2, 0.5, 0.1, 0.02, 0])\n", "model = VCregression(seq_length=5, alphabet_type='dna', lambdas=lambdas)\n", "data = model.simulate(p_missing=0.2, sigma=0.1).drop('y_true', axis=1).dropna()\n", "data" ] }, { "cell_type": "markdown", "id": "502e0dfa", "metadata": {}, "source": [ "The method `build` will use some default values to run Variance Component regression and compute visualization coordinates automatically, but may not be the best choice for any particular dataset." ] }, { "cell_type": "code", "execution_count": 9, "id": "de818e53", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 100/100 [00:02<00:00, 41.79it/s]\n" ] } ], "source": [ "test = DataSet('test', data=data)\n", "test.build()" ] }, { "cell_type": "markdown", "id": "07a8f860", "metadata": {}, "source": [ "We can now re-load the dataset from disk and verify that it contains the visualization attributes\n", "\n", "> Note that reinstalling the library will erase the newly created `DataSet`s" ] }, { "cell_type": "code", "execution_count": 10, "id": "6c3a1e5b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
12345678910...1314151617181920functionstationary_freq
AAAAA2.9225321.6129961.9225661.9288360.6512380.724700-0.6262360.457580-0.1602870.765350...2.4769163.1358440.2945681.121940-0.204409-1.4079410.419596-1.271204-0.0198140.000053
AAAAC2.6819521.4341382.0819140.6779910.5931481.109453-0.8245061.029326-0.2096920.677500...1.7686032.7929910.7895081.0527830.499965-1.9283830.194958-0.919782-0.0380360.000044
AAAAG2.6040611.9083992.1592820.3200910.3766230.861715-0.1151100.9658540.5376700.772480...2.1375012.5277060.9507941.1334360.317646-1.5938510.628486-1.566774-0.1205350.000020
AAAAT1.9604182.0154462.802114-0.1444811.1414202.506787-1.3584541.476034-0.3331920.398977...1.8902872.6148522.739016-0.219729-0.199154-1.5681830.575084-1.8858190.2209530.000543
AAACA3.8148001.2616990.8110804.4484370.032213-0.514836-0.255077-0.670186-0.1336880.566255...1.5771361.222345-0.5272130.709598-0.291410-1.466313-0.167420-1.1829960.2015570.000450
..................................................................
TTTGT2.140434-0.3948540.736442-0.155903-0.1805211.4151610.5838850.179762-0.5471570.584356...0.4415160.353044-0.189312-0.8647220.2052540.092788-0.2097530.350808-0.1149520.000021
TTTTA2.9020030.551990-0.2752401.688366-0.7651700.5585540.6525950.057239-0.154063-0.473734...0.337019-0.614797-0.057125-0.0696881.1671450.828056-0.5412041.131136-0.2135290.000008
TTTTC2.6010670.128646-0.2676890.344941-0.7932820.6258320.1773490.472832-0.4667000.047076...-0.532616-0.050942-0.4588220.3651460.3999180.206811-0.0980760.757820-0.3447450.000002
TTTTG2.1574130.586338-0.3971410.210420-0.6768360.7446880.8959720.4219510.458642-0.347510...-0.2382230.0933210.2443680.2072880.3726030.0864080.1942930.455535-0.2831530.000004
TTTTT0.2446230.3105160.5001020.003702-0.1269801.2078710.0620540.033888-0.166582-0.031377...0.0306370.149465-0.0679440.2227340.2238810.163491-0.0618880.205998-0.0904350.000027
\n", "

1024 rows × 22 columns

\n", "
" ], "text/plain": [ " 1 2 3 4 5 6 7 \\\n", "AAAAA 2.922532 1.612996 1.922566 1.928836 0.651238 0.724700 -0.626236 \n", "AAAAC 2.681952 1.434138 2.081914 0.677991 0.593148 1.109453 -0.824506 \n", "AAAAG 2.604061 1.908399 2.159282 0.320091 0.376623 0.861715 -0.115110 \n", "AAAAT 1.960418 2.015446 2.802114 -0.144481 1.141420 2.506787 -1.358454 \n", "AAACA 3.814800 1.261699 0.811080 4.448437 0.032213 -0.514836 -0.255077 \n", "... ... ... ... ... ... ... ... \n", "TTTGT 2.140434 -0.394854 0.736442 -0.155903 -0.180521 1.415161 0.583885 \n", "TTTTA 2.902003 0.551990 -0.275240 1.688366 -0.765170 0.558554 0.652595 \n", "TTTTC 2.601067 0.128646 -0.267689 0.344941 -0.793282 0.625832 0.177349 \n", "TTTTG 2.157413 0.586338 -0.397141 0.210420 -0.676836 0.744688 0.895972 \n", "TTTTT 0.244623 0.310516 0.500102 0.003702 -0.126980 1.207871 0.062054 \n", "\n", " 8 9 10 ... 13 14 15 \\\n", "AAAAA 0.457580 -0.160287 0.765350 ... 2.476916 3.135844 0.294568 \n", "AAAAC 1.029326 -0.209692 0.677500 ... 1.768603 2.792991 0.789508 \n", "AAAAG 0.965854 0.537670 0.772480 ... 2.137501 2.527706 0.950794 \n", "AAAAT 1.476034 -0.333192 0.398977 ... 1.890287 2.614852 2.739016 \n", "AAACA -0.670186 -0.133688 0.566255 ... 1.577136 1.222345 -0.527213 \n", "... ... ... ... ... ... ... ... \n", "TTTGT 0.179762 -0.547157 0.584356 ... 0.441516 0.353044 -0.189312 \n", "TTTTA 0.057239 -0.154063 -0.473734 ... 0.337019 -0.614797 -0.057125 \n", "TTTTC 0.472832 -0.466700 0.047076 ... -0.532616 -0.050942 -0.458822 \n", "TTTTG 0.421951 0.458642 -0.347510 ... -0.238223 0.093321 0.244368 \n", "TTTTT 0.033888 -0.166582 -0.031377 ... 0.030637 0.149465 -0.067944 \n", "\n", " 16 17 18 19 20 function \\\n", "AAAAA 1.121940 -0.204409 -1.407941 0.419596 -1.271204 -0.019814 \n", "AAAAC 1.052783 0.499965 -1.928383 0.194958 -0.919782 -0.038036 \n", "AAAAG 1.133436 0.317646 -1.593851 0.628486 -1.566774 -0.120535 \n", "AAAAT -0.219729 -0.199154 -1.568183 0.575084 -1.885819 0.220953 \n", "AAACA 0.709598 -0.291410 -1.466313 -0.167420 -1.182996 0.201557 \n", "... ... ... ... ... ... ... \n", "TTTGT -0.864722 0.205254 0.092788 -0.209753 0.350808 -0.114952 \n", "TTTTA -0.069688 1.167145 0.828056 -0.541204 1.131136 -0.213529 \n", "TTTTC 0.365146 0.399918 0.206811 -0.098076 0.757820 -0.344745 \n", "TTTTG 0.207288 0.372603 0.086408 0.194293 0.455535 -0.283153 \n", "TTTTT 0.222734 0.223881 0.163491 -0.061888 0.205998 -0.090435 \n", "\n", " stationary_freq \n", "AAAAA 0.000053 \n", "AAAAC 0.000044 \n", "AAAAG 0.000020 \n", "AAAAT 0.000543 \n", "AAACA 0.000450 \n", "... ... \n", "TTTGT 0.000021 \n", "TTTTA 0.000008 \n", "TTTTC 0.000002 \n", "TTTTG 0.000004 \n", "TTTTT 0.000027 \n", "\n", "[1024 rows x 22 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test = DataSet('test')\n", "test.nodes" ] } ], "metadata": { "kernelspec": { "display_name": "gpmap", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.18" } }, "nbformat": 4, "nbformat_minor": 5 }