{ "cells": [ { "cell_type": "markdown", "id": "dca40d3f", "metadata": {}, "source": [ "# tdp43 dataset" ] }, { "cell_type": "code", "execution_count": 1, "id": "43acb29b", "metadata": { "ExecuteTime": { "end_time": "2021-11-12T19:01:23.948836Z", "start_time": "2021-11-12T19:01:21.718880Z" } }, "outputs": [], "source": [ "# Standard imports\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "# Special imports\n", "import mavenn\n", "import os\n", "import urllib" ] }, { "cell_type": "markdown", "id": "5c257c40", "metadata": { "ExecuteTime": { "end_time": "2021-11-11T17:26:47.608641Z", "start_time": "2021-11-11T17:26:47.392567Z" } }, "source": [ "## Summary" ] }, { "cell_type": "markdown", "id": "c7e0fed0", "metadata": { "ExecuteTime": { "end_time": "2021-11-11T17:27:24.538136Z", "start_time": "2021-11-11T17:27:24.529622Z" } }, "source": [ "The deep mutagenesis dataset of Bolognesi et al., 2019. \n", "TAR DNA-binding protein 43 (TDP-43) is a heterogeneous nuclear ribonucleoprotein (hnRNP) in the cell nucleus which has a key role in regulating gene expression. Several neurodegenerative disorders have been associated with cytoplasmic aggregation of TDP-43, including amyotrophic lateral sclerosis (ALS), frontotemporal lobar degeneration (FTLD), Alzheimer's, Parkinson's, and Huntington's disease. Bolognesi et al., performed a comprehensive deep mutagenesis, using error-prone oligonucleotide synthesis to comprehensively mutate the prion-like domain (PRD) of TDP-43 and reported toxicity as a function of 1266 single and 56730 double mutations.\n", "\n", "\n", "**Names**: ``'tdp43'``\n", "\n", "**Reference**: Benedetta B, Faure AJ, Seuma M, Schmiedel JM, Tartaglia GG, Lehner B. The mutational landscape of a prion-like domain. [Nature Comm 10:4162 (2019)](https://doi.org/10.1038/s41467-019-12101-z)." ] }, { "cell_type": "code", "execution_count": 2, "id": "ba16bbe4", "metadata": { "ExecuteTime": { "end_time": "2021-11-12T19:01:24.039194Z", "start_time": "2021-11-12T19:01:23.949885Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | set | \n", "dist | \n", "y | \n", "dy | \n", "x | \n", "
|---|---|---|---|---|---|
| 0 | \n", "training | \n", "1 | \n", "0.032210 | \n", "0.037438 | \n", "NNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 1 | \n", "training | \n", "1 | \n", "-0.009898 | \n", "0.038981 | \n", "TNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 2 | \n", "training | \n", "1 | \n", "-0.010471 | \n", "0.005176 | \n", "RNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 3 | \n", "training | \n", "1 | \n", "0.030803 | \n", "0.005341 | \n", "SNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 4 | \n", "training | \n", "1 | \n", "-0.054716 | \n", "0.035752 | \n", "INSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 57991 | \n", "training | \n", "2 | \n", "-0.009706 | \n", "0.035128 | \n", "GNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 57992 | \n", "validation | \n", "2 | \n", "-0.030744 | \n", "0.029436 | \n", "GNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 57993 | \n", "validation | \n", "2 | \n", "-0.086802 | \n", "0.033174 | \n", "GNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 57994 | \n", "training | \n", "2 | \n", "-0.049587 | \n", "0.029130 | \n", "GNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
| 57995 | \n", "training | \n", "2 | \n", "-0.105390 | \n", "0.031189 | \n", "GNSRGGGAGLGNNQGSNMGGGMNFGAFSINPAMMAAAQAALQSSWG... | \n", "
57996 rows × 5 columns
\n", "| \n", " | Pos | \n", "WT_AA | \n", "Mut | \n", "Nmut_nt | \n", "Nmut_aa | \n", "Nmut_codons | \n", "STOP | \n", "mean_count | \n", "is.reads0 | \n", "sigma | \n", "toxicity | \n", "region | \n", "Pos_abs | \n", "mut_code | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "1 | \n", "G | \n", "N | \n", "2 | \n", "1 | \n", "1 | \n", "False | \n", "22.000000 | \n", "True | \n", "0.037438 | \n", "0.032210 | \n", "290 | \n", "290 | \n", "G290N | \n", "
| 1 | \n", "1 | \n", "G | \n", "T | \n", "2 | \n", "1 | \n", "1 | \n", "False | \n", "17.333333 | \n", "True | \n", "0.038981 | \n", "-0.009898 | \n", "290 | \n", "290 | \n", "G290T | \n", "
| 2 | \n", "1 | \n", "G | \n", "R | \n", "2 | \n", "1 | \n", "1 | \n", "False | \n", "3888.666667 | \n", "True | \n", "0.005176 | \n", "-0.010471 | \n", "290 | \n", "290 | \n", "G290R | \n", "
| 3 | \n", "1 | \n", "G | \n", "S | \n", "2 | \n", "1 | \n", "1 | \n", "False | \n", "3635.666667 | \n", "True | \n", "0.005341 | \n", "0.030803 | \n", "290 | \n", "290 | \n", "G290S | \n", "
| 4 | \n", "1 | \n", "G | \n", "I | \n", "2 | \n", "1 | \n", "1 | \n", "False | \n", "21.666667 | \n", "True | \n", "0.035752 | \n", "-0.054716 | \n", "290 | \n", "290 | \n", "G290I | \n", "
| \n", " | Nmut_nt | \n", "Nmut_aa | \n", "Nmut_codons | \n", "STOP | \n", "mean_count | \n", "is.reads0 | \n", "Pos1 | \n", "Pos2 | \n", "WT_AA1 | \n", "WT_AA2 | \n", "... | \n", "sigma_cond | \n", "toxicity1 | \n", "toxicity2 | \n", "toxicity_uncorr | \n", "toxicity_cond | \n", "region | \n", "Pos_abs1 | \n", "Pos_abs2 | \n", "mut_code1 | \n", "mut_code2 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "2 | \n", "2 | \n", "2 | \n", "True | \n", "16.333333 | \n", "True | \n", "1 | \n", "4 | \n", "G | \n", "R | \n", "... | \n", "0.020867 | \n", "0.001282 | \n", "-0.174307 | \n", "-0.139949 | \n", "-0.169501 | \n", "290 | \n", "290 | \n", "293 | \n", "G290A | \n", "R293* | \n", "
| 1 | \n", "4 | \n", "2 | \n", "2 | \n", "True | \n", "30.333333 | \n", "True | \n", "1 | \n", "4 | \n", "G | \n", "R | \n", "... | \n", "0.017555 | \n", "0.007680 | \n", "-0.174307 | \n", "-0.206614 | \n", "-0.193387 | \n", "290 | \n", "290 | \n", "293 | \n", "G290C | \n", "R293* | \n", "
| 2 | \n", "2 | \n", "2 | \n", "2 | \n", "True | \n", "43.333333 | \n", "True | \n", "1 | \n", "4 | \n", "G | \n", "R | \n", "... | \n", "0.017882 | \n", "0.044342 | \n", "-0.174307 | \n", "-0.123376 | \n", "-0.142809 | \n", "290 | \n", "290 | \n", "293 | \n", "G290D | \n", "R293* | \n", "
| 3 | \n", "2 | \n", "2 | \n", "2 | \n", "True | \n", "22.333333 | \n", "True | \n", "1 | \n", "4 | \n", "G | \n", "R | \n", "... | \n", "0.018913 | \n", "-0.010471 | \n", "-0.174307 | \n", "-0.136759 | \n", "-0.165018 | \n", "290 | \n", "290 | \n", "293 | \n", "G290R | \n", "R293* | \n", "
| 4 | \n", "2 | \n", "2 | \n", "2 | \n", "True | \n", "29.333333 | \n", "True | \n", "1 | \n", "4 | \n", "G | \n", "R | \n", "... | \n", "0.021690 | \n", "0.030803 | \n", "-0.174307 | \n", "-0.118746 | \n", "-0.153186 | \n", "290 | \n", "290 | \n", "293 | \n", "G290S | \n", "R293* | \n", "
5 rows × 25 columns
\n", "