{ "cells": [ { "cell_type": "markdown", "id": "be5d3b4f", "metadata": {}, "source": [ "# Datasets\n", "\n", "Datasets consist of complete combinatorial landscapes to visualize and work with, as well as the data from which the were inferred. Both inference of the complete landscape and calculation of the coordinates of the visualization are precomputed to provide rapid access to the different layers of interest." ] }, { "cell_type": "code", "execution_count": 1, "id": "6b281ef0", "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "import numpy as np\n", "\n", "from gpmap.datasets import DataSet, list_available_datasets\n", "from gpmap.inference import VCregression" ] }, { "cell_type": "markdown", "id": "c419153e", "metadata": {}, "source": [ "## How to load a built-in dataset\n", "\n", "We include a series of datasets that are used throughout the documentation for demonstration of the different applications and are directly accessible after installation of the library for any user.\n", "The list of built-in datasets can be easily shown as follows" ] }, { "cell_type": "code", "execution_count": 2, "id": "a40a04a0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['5ss', 'f1u', 'test', 'dmsc', 'gb1', 'smn1', 'serine', 'trna', 'pard']" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list_available_datasets()" ] }, { "cell_type": "markdown", "id": "508e43b7", "metadata": {}, "source": [ "### How to access combinatorial landscape values\n", "\n", "And one can easily load one of those datasets as illustrated in some previous tutorials, and all of them should contain at least a `landscape` attribute containing the phenotype associated to each possible genotype" ] }, { "cell_type": "code", "execution_count": 3, "id": "b5b44e02", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | y | \n", "
|---|---|
| seq | \n", "\n", " |
| AAAA | \n", "0.296301 | \n", "
| AAAC | \n", "-2.713474 | \n", "
| AAAD | \n", "-2.912992 | \n", "
| AAAE | \n", "-4.548719 | \n", "
| AAAF | \n", "-3.276738 | \n", "
| ... | \n", "... | \n", "
| YYYS | \n", "-4.662925 | \n", "
| YYYT | \n", "-3.223102 | \n", "
| YYYV | \n", "-3.001718 | \n", "
| YYYW | \n", "-4.723318 | \n", "
| YYYY | \n", "-4.876429 | \n", "
160000 rows × 1 columns
\n", "| \n", " | y | \n", "y_var | \n", "
|---|---|---|
| sequence | \n", "\n", " | \n", " |
| AAAA | \n", "0.460831 | \n", "0.046009 | \n", "
| AAAG | \n", "-2.192261 | \n", "0.255906 | \n", "
| AAAH | \n", "-4.728306 | \n", "2.064530 | \n", "
| AAAI | \n", "-4.338842 | \n", "2.095252 | \n", "
| AAAL | \n", "-2.326240 | \n", "0.087518 | \n", "
| ... | \n", "... | \n", "... | \n", "
| YYYS | \n", "-5.269987 | \n", "0.291090 | \n", "
| YYYT | \n", "-3.821426 | \n", "0.074489 | \n", "
| YYYV | \n", "-3.143536 | \n", "0.074682 | \n", "
| YYYW | \n", "-4.306581 | \n", "0.699467 | \n", "
| YYYY | \n", "-4.429813 | \n", "0.417405 | \n", "
149361 rows × 2 columns
\n", "| \n", " | 1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "10 | \n", "function | \n", "stationary_freq | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAAA | \n", "-0.270938 | \n", "-0.944304 | \n", "-0.227171 | \n", "0.744803 | \n", "0.059077 | \n", "-0.077512 | \n", "-0.477853 | \n", "0.174491 | \n", "0.015944 | \n", "0.052664 | \n", "0.296301 | \n", "1.067767e-04 | \n", "
| AAAC | \n", "0.033789 | \n", "-0.232603 | \n", "-0.271458 | \n", "0.576487 | \n", "0.035619 | \n", "0.087608 | \n", "0.590118 | \n", "-0.249005 | \n", "-0.087750 | \n", "-0.110291 | \n", "-2.713474 | \n", "4.954648e-06 | \n", "
| AAAD | \n", "-0.020398 | \n", "-0.127749 | \n", "-0.174455 | \n", "0.347843 | \n", "0.142684 | \n", "0.208679 | \n", "0.590025 | \n", "0.160819 | \n", "0.354397 | \n", "0.676487 | \n", "-2.912992 | \n", "4.042194e-06 | \n", "
| AAAE | \n", "-0.001018 | \n", "-0.138712 | \n", "-0.183161 | \n", "0.340728 | \n", "0.121067 | \n", "0.157871 | \n", "0.436407 | \n", "0.195630 | \n", "0.211100 | \n", "0.364298 | \n", "-4.548719 | \n", "7.619345e-07 | \n", "
| AAAF | \n", "0.149717 | \n", "-0.156524 | \n", "-0.239304 | \n", "0.386243 | \n", "0.103285 | \n", "0.107756 | \n", "0.302406 | \n", "0.051575 | \n", "0.171278 | \n", "0.226772 | \n", "-3.276738 | \n", "2.789084e-06 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| YYYS | \n", "0.073880 | \n", "0.038075 | \n", "-0.097751 | \n", "0.156184 | \n", "0.056463 | \n", "0.074291 | \n", "0.262512 | \n", "0.019037 | \n", "0.144439 | \n", "0.172686 | \n", "-4.662925 | \n", "6.781399e-07 | \n", "
| YYYT | \n", "-0.091125 | \n", "0.213370 | \n", "0.256403 | \n", "0.246274 | \n", "-0.086279 | \n", "0.026923 | \n", "0.217086 | \n", "0.102111 | \n", "0.593257 | \n", "0.003682 | \n", "-3.223102 | \n", "2.945947e-06 | \n", "
| YYYV | \n", "0.016488 | \n", "0.195242 | \n", "0.216320 | \n", "0.035269 | \n", "0.306726 | \n", "0.025334 | \n", "0.217759 | \n", "-0.028542 | \n", "-0.038378 | \n", "0.148356 | \n", "-3.001718 | \n", "3.692393e-06 | \n", "
| YYYW | \n", "0.134072 | \n", "0.114107 | \n", "-0.043856 | \n", "0.011092 | \n", "0.076565 | \n", "0.109907 | \n", "0.261274 | \n", "0.108909 | \n", "0.180371 | \n", "0.365348 | \n", "-4.723318 | \n", "6.376209e-07 | \n", "
| YYYY | \n", "0.086278 | \n", "0.113188 | \n", "-0.035062 | \n", "0.017597 | \n", "0.096670 | \n", "0.249598 | \n", "0.261107 | \n", "0.108695 | \n", "0.188331 | \n", "0.368750 | \n", "-4.876429 | \n", "5.454157e-07 | \n", "
160000 rows × 12 columns
\n", "| \n", " | i | \n", "j | \n", "
|---|---|---|
| 0 | \n", "0 | \n", "1 | \n", "
| 1 | \n", "0 | \n", "2 | \n", "
| 2 | \n", "0 | \n", "3 | \n", "
| 3 | \n", "0 | \n", "4 | \n", "
| 4 | \n", "0 | \n", "5 | \n", "
| ... | \n", "... | \n", "... | \n", "
| 6079995 | \n", "159996 | \n", "159998 | \n", "
| 6079996 | \n", "159996 | \n", "159999 | \n", "
| 6079997 | \n", "159997 | \n", "159998 | \n", "
| 6079998 | \n", "159997 | \n", "159999 | \n", "
| 6079999 | \n", "159998 | \n", "159999 | \n", "
6080000 rows × 2 columns
\n", "| \n", " | k | \n", "decay_rates | \n", "relaxation_time | \n", "
|---|---|---|---|
| 0 | \n", "1 | \n", "2.554843 | \n", "0.391413 | \n", "
| 1 | \n", "2 | \n", "3.566862 | \n", "0.280359 | \n", "
| 2 | \n", "3 | \n", "4.926568 | \n", "0.202981 | \n", "
| 3 | \n", "4 | \n", "5.023657 | \n", "0.199058 | \n", "
| 4 | \n", "5 | \n", "5.303026 | \n", "0.188572 | \n", "
| 5 | \n", "6 | \n", "5.635594 | \n", "0.177444 | \n", "
| 6 | \n", "7 | \n", "6.294868 | \n", "0.158860 | \n", "
| 7 | \n", "8 | \n", "6.543588 | \n", "0.152821 | \n", "
| 8 | \n", "9 | \n", "6.741685 | \n", "0.148331 | \n", "
| 9 | \n", "10 | \n", "7.000798 | \n", "0.142841 | \n", "
| \n", " | y | \n", "y_var | \n", "
|---|---|---|
| AAAAA | \n", "0.039540 | \n", "0.01 | \n", "
| AAAAG | \n", "-0.117862 | \n", "0.01 | \n", "
| AAAAT | \n", "0.303257 | \n", "0.01 | \n", "
| AAACA | \n", "0.230550 | \n", "0.01 | \n", "
| AAACC | \n", "-0.000383 | \n", "0.01 | \n", "
| ... | \n", "... | \n", "... | \n", "
| TTTCT | \n", "-0.010906 | \n", "0.01 | \n", "
| TTTGG | \n", "-0.398118 | \n", "0.01 | \n", "
| TTTTC | \n", "-0.320779 | \n", "0.01 | \n", "
| TTTTG | \n", "-0.266456 | \n", "0.01 | \n", "
| TTTTT | \n", "-0.141016 | \n", "0.01 | \n", "
814 rows × 2 columns
\n", "| \n", " | 1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "10 | \n", "... | \n", "13 | \n", "14 | \n", "15 | \n", "16 | \n", "17 | \n", "18 | \n", "19 | \n", "20 | \n", "function | \n", "stationary_freq | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AAAAA | \n", "2.922532 | \n", "1.612996 | \n", "1.922566 | \n", "1.928836 | \n", "0.651238 | \n", "0.724700 | \n", "-0.626236 | \n", "0.457580 | \n", "-0.160287 | \n", "0.765350 | \n", "... | \n", "2.476916 | \n", "3.135844 | \n", "0.294568 | \n", "1.121940 | \n", "-0.204409 | \n", "-1.407941 | \n", "0.419596 | \n", "-1.271204 | \n", "-0.019814 | \n", "0.000053 | \n", "
| AAAAC | \n", "2.681952 | \n", "1.434138 | \n", "2.081914 | \n", "0.677991 | \n", "0.593148 | \n", "1.109453 | \n", "-0.824506 | \n", "1.029326 | \n", "-0.209692 | \n", "0.677500 | \n", "... | \n", "1.768603 | \n", "2.792991 | \n", "0.789508 | \n", "1.052783 | \n", "0.499965 | \n", "-1.928383 | \n", "0.194958 | \n", "-0.919782 | \n", "-0.038036 | \n", "0.000044 | \n", "
| AAAAG | \n", "2.604061 | \n", "1.908399 | \n", "2.159282 | \n", "0.320091 | \n", "0.376623 | \n", "0.861715 | \n", "-0.115110 | \n", "0.965854 | \n", "0.537670 | \n", "0.772480 | \n", "... | \n", "2.137501 | \n", "2.527706 | \n", "0.950794 | \n", "1.133436 | \n", "0.317646 | \n", "-1.593851 | \n", "0.628486 | \n", "-1.566774 | \n", "-0.120535 | \n", "0.000020 | \n", "
| AAAAT | \n", "1.960418 | \n", "2.015446 | \n", "2.802114 | \n", "-0.144481 | \n", "1.141420 | \n", "2.506787 | \n", "-1.358454 | \n", "1.476034 | \n", "-0.333192 | \n", "0.398977 | \n", "... | \n", "1.890287 | \n", "2.614852 | \n", "2.739016 | \n", "-0.219729 | \n", "-0.199154 | \n", "-1.568183 | \n", "0.575084 | \n", "-1.885819 | \n", "0.220953 | \n", "0.000543 | \n", "
| AAACA | \n", "3.814800 | \n", "1.261699 | \n", "0.811080 | \n", "4.448437 | \n", "0.032213 | \n", "-0.514836 | \n", "-0.255077 | \n", "-0.670186 | \n", "-0.133688 | \n", "0.566255 | \n", "... | \n", "1.577136 | \n", "1.222345 | \n", "-0.527213 | \n", "0.709598 | \n", "-0.291410 | \n", "-1.466313 | \n", "-0.167420 | \n", "-1.182996 | \n", "0.201557 | \n", "0.000450 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| TTTGT | \n", "2.140434 | \n", "-0.394854 | \n", "0.736442 | \n", "-0.155903 | \n", "-0.180521 | \n", "1.415161 | \n", "0.583885 | \n", "0.179762 | \n", "-0.547157 | \n", "0.584356 | \n", "... | \n", "0.441516 | \n", "0.353044 | \n", "-0.189312 | \n", "-0.864722 | \n", "0.205254 | \n", "0.092788 | \n", "-0.209753 | \n", "0.350808 | \n", "-0.114952 | \n", "0.000021 | \n", "
| TTTTA | \n", "2.902003 | \n", "0.551990 | \n", "-0.275240 | \n", "1.688366 | \n", "-0.765170 | \n", "0.558554 | \n", "0.652595 | \n", "0.057239 | \n", "-0.154063 | \n", "-0.473734 | \n", "... | \n", "0.337019 | \n", "-0.614797 | \n", "-0.057125 | \n", "-0.069688 | \n", "1.167145 | \n", "0.828056 | \n", "-0.541204 | \n", "1.131136 | \n", "-0.213529 | \n", "0.000008 | \n", "
| TTTTC | \n", "2.601067 | \n", "0.128646 | \n", "-0.267689 | \n", "0.344941 | \n", "-0.793282 | \n", "0.625832 | \n", "0.177349 | \n", "0.472832 | \n", "-0.466700 | \n", "0.047076 | \n", "... | \n", "-0.532616 | \n", "-0.050942 | \n", "-0.458822 | \n", "0.365146 | \n", "0.399918 | \n", "0.206811 | \n", "-0.098076 | \n", "0.757820 | \n", "-0.344745 | \n", "0.000002 | \n", "
| TTTTG | \n", "2.157413 | \n", "0.586338 | \n", "-0.397141 | \n", "0.210420 | \n", "-0.676836 | \n", "0.744688 | \n", "0.895972 | \n", "0.421951 | \n", "0.458642 | \n", "-0.347510 | \n", "... | \n", "-0.238223 | \n", "0.093321 | \n", "0.244368 | \n", "0.207288 | \n", "0.372603 | \n", "0.086408 | \n", "0.194293 | \n", "0.455535 | \n", "-0.283153 | \n", "0.000004 | \n", "
| TTTTT | \n", "0.244623 | \n", "0.310516 | \n", "0.500102 | \n", "0.003702 | \n", "-0.126980 | \n", "1.207871 | \n", "0.062054 | \n", "0.033888 | \n", "-0.166582 | \n", "-0.031377 | \n", "... | \n", "0.030637 | \n", "0.149465 | \n", "-0.067944 | \n", "0.222734 | \n", "0.223881 | \n", "0.163491 | \n", "-0.061888 | \n", "0.205998 | \n", "-0.090435 | \n", "0.000027 | \n", "
1024 rows × 22 columns
\n", "