Tutorial 1: Built-in demonstration scripts
MAVE-NN provides built-in demonstration scripts, or “demos”, to help users quickly get started training and visualizing models. Demos are self-contained Python scripts that can be executed by calling mavenn.run_demo. To get a list of demo names, execute this function without passing any arguments:
[1]:
# Import MAVE-NN
import mavenn
# Get list of demos
mavenn.run_demo()
To run a demo, execute
>>> mavenn.run_demo(name)
where 'name' is one of the following strings:
1. "gb1_ge_evaluation"
2. "mpsa_ge_training"
3. "sortseq_mpa_visualization"
Python code for each demo is located in
/opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/
[1]:
['gb1_ge_evaluation', 'mpsa_ge_training', 'sortseq_mpa_visualization']
To see the Python code for any one of these demos, pass the keyword argument print_code=True to mavenn.run_demo(). Alternatively, navigate to the folder that is printed when executing mavenn.run_demo() on your machine and open the corresponding *.py file.
Evaluating a GE regression model
The 'gb1_ge_evaluation' demo illustrates an additive G-P map and GE measurement process fit to data from a deep mutational scanning (DMS) experiment performed on protein GB1 by Olson et al., 2014.
[2]:
mavenn.run_demo('gb1_ge_evaluation', print_code=False)
Running /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/gb1_ge_evaluation.py...
Using mavenn at: /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn
Model loaded from these files:
/opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/gb1_ge_additive.pickle
/opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/gb1_ge_additive.weights.h5
Done!
Visualizing an MPA regression model
The 'sortseq_mpa_visualization' demo illustrates an additive G-P map, along with an MPA measurement process, fit to data from a sort-seq MPRA performed by Kinney et al., 2010.
[3]:
mavenn.run_demo('sortseq_mpa_visualization', print_code=False)
Running /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/sortseq_mpa_visualization.py...
Using mavenn at: /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn
Model loaded from these files:
/opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/sortseq_mpa_additive.pickle
/opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/sortseq_mpa_additive.weights.h5
Done!
Training a GE regression model
The 'mpsa_ge_training' demo uses GE regression to train a pairwise G-P map on data from a massively parallel splicing assay (MPSA) reported by Wong et al., 2018. This training process usually takes under a minute on a standard laptop.
[4]:
mavenn.run_demo('mpsa_ge_training', print_code=False)
Running /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/mpsa_ge_training.py...
/opt/miniconda3/envs/test_mavenn/lib/python3.12
Using mavenn at: /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn
N = 24,405 observations set as training data.
Using 20.0% for validation.
Data shuffled.
Time to set data: 0.204 sec.
LSMR Least-squares solution of Ax = b
The matrix A has 19520 rows and 36 columns
damp = 0.00000000000000e+00
atol = 1.00e-06 conlim = 1.00e+08
btol = 1.00e-06 maxiter = 36
itn x(1) norm r norm Ar compatible LS norm A cond A
0 0.00000e+00 1.393e+02 4.656e+03 1.0e+00 2.4e-01
1 2.57379e-03 1.269e+02 1.630e+03 9.1e-01 1.6e-01 8.0e+01 1.0e+00
2 4.93456e-03 1.263e+02 1.312e+03 9.1e-01 6.9e-02 1.5e+02 1.3e+00
3 4.87793e-03 1.253e+02 6.137e+01 9.0e-01 1.8e-03 2.8e+02 2.3e+00
4 4.05838e-03 1.253e+02 5.831e+00 9.0e-01 1.6e-04 2.9e+02 2.8e+00
5 4.01729e-03 1.253e+02 4.711e-01 9.0e-01 1.3e-05 3.0e+02 3.0e+00
6 4.01160e-03 1.253e+02 1.792e-02 9.0e-01 4.7e-07 3.0e+02 3.0e+00
LSMR finished
The least-squares solution is good enough, given atol
istop = 2 normr = 1.3e+02
normA = 3.0e+02 normAr = 1.8e-02
itn = 6 condA = 3.0e+00
normx = 8.3e-01
6 4.01160e-03 1.253e+02 1.792e-02
9.0e-01 4.7e-07 3.0e+02 3.0e+00
Linear regression time: 0.0046 sec
Epoch 1/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - loss: 56.2718 - val_loss: 40.3023 - I_var: 0.1957 - val_I_var: 0.1895
Epoch 2/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 41.2576 - val_loss: 39.5769 - I_var: 0.2210 - val_I_var: 0.2147
Epoch 3/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.7031 - val_loss: 39.7165 - I_var: 0.2049 - val_I_var: 0.2248
Epoch 4/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 40.0464 - val_loss: 39.2197 - I_var: 0.2242 - val_I_var: 0.2291
Epoch 5/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.3629 - val_loss: 38.9295 - I_var: 0.2471 - val_I_var: 0.2370
Epoch 6/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.4129 - val_loss: 38.6362 - I_var: 0.2572 - val_I_var: 0.2727
Epoch 7/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.9599 - val_loss: 38.5292 - I_var: 0.2509 - val_I_var: 0.2489
Epoch 8/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.5122 - val_loss: 38.2378 - I_var: 0.2642 - val_I_var: 0.2743
Epoch 9/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.1330 - val_loss: 38.1590 - I_var: 0.2497 - val_I_var: 0.2634
Epoch 10/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.3886 - val_loss: 38.2362 - I_var: 0.2535 - val_I_var: 0.2852
Epoch 11/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.4481 - val_loss: 38.4378 - I_var: 0.2499 - val_I_var: 0.2529
Epoch 12/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.7436 - val_loss: 37.7544 - I_var: 0.2837 - val_I_var: 0.2763
Epoch 13/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.6459 - val_loss: 37.5600 - I_var: 0.2939 - val_I_var: 0.2908
Epoch 14/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 37.5865 - val_loss: 36.4163 - I_var: 0.3371 - val_I_var: 0.3210
Epoch 15/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 37.3624 - val_loss: 37.8088 - I_var: 0.2952 - val_I_var: 0.2838
Epoch 16/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.1212 - val_loss: 35.6450 - I_var: 0.3327 - val_I_var: 0.3371
Epoch 17/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.5077 - val_loss: 35.6745 - I_var: 0.3541 - val_I_var: 0.3652
Epoch 18/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.7335 - val_loss: 36.4800 - I_var: 0.3292 - val_I_var: 0.3116
Epoch 19/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.7361 - val_loss: 35.8878 - I_var: 0.3432 - val_I_var: 0.3403
Epoch 20/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.3855 - val_loss: 36.5340 - I_var: 0.3123 - val_I_var: 0.2978
Epoch 21/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 35.5454 - val_loss: 35.5232 - I_var: 0.3526 - val_I_var: 0.3682
Epoch 22/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.0334 - val_loss: 36.3785 - I_var: 0.3480 - val_I_var: 0.3283
Epoch 23/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.4779 - val_loss: 35.6944 - I_var: 0.3606 - val_I_var: 0.3458
Epoch 24/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 35.6235 - val_loss: 35.5917 - I_var: 0.3597 - val_I_var: 0.3541
Epoch 25/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.3092 - val_loss: 35.5467 - I_var: 0.3431 - val_I_var: 0.3630
Epoch 26/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.4948 - val_loss: 35.6104 - I_var: 0.3516 - val_I_var: 0.3372
Epoch 27/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.7024 - val_loss: 35.5499 - I_var: 0.3532 - val_I_var: 0.3494
Epoch 28/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 35.9750 - val_loss: 35.6912 - I_var: 0.3507 - val_I_var: 0.3447
Epoch 29/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.1810 - val_loss: 35.2952 - I_var: 0.3642 - val_I_var: 0.3612
Epoch 30/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.1622 - val_loss: 35.3430 - I_var: 0.3614 - val_I_var: 0.3634
Training time: 22.7 seconds
I_var_test: 0.327 +- 0.027 bits
I_pred_test: 0.359 +- 0.014 bits
Done!
References
Kinney J, Murugan A, Callan C, Cox E. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci USA. 107:9158-9163 (2010).
Olson CA, Wu NC, Sun R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr Biol 24:2643–2651 (2014).
Wong M, Kinney J, Krainer A. Quantitative activity profile and context dependence of all human 5’ splice sites. Mol Cell 71:1012-1026.e3 (2018).
[ ]: