Tutorial 1: Built-in demonstration scripts

MAVE-NN provides built-in demonstration scripts, or “demos”, to help users quickly get started training and visualizing models. Demos are self-contained Python scripts that can be executed by calling mavenn.run_demo. To get a list of demo names, execute this function without passing any arguments:

[1]:
# Import MAVE-NN
import mavenn

# Get list of demos
mavenn.run_demo()
To run a demo, execute

        >>> mavenn.run_demo(name)

where 'name' is one of the following strings:

        1. "gb1_ge_evaluation"
        2. "mpsa_ge_training"
        3. "sortseq_mpa_visualization"

Python code for each demo is located in

        /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/

[1]:
['gb1_ge_evaluation', 'mpsa_ge_training', 'sortseq_mpa_visualization']

To see the Python code for any one of these demos, pass the keyword argument print_code=True to mavenn.run_demo(). Alternatively, navigate to the folder that is printed when executing mavenn.run_demo() on your machine and open the corresponding *.py file.

Evaluating a GE regression model

The 'gb1_ge_evaluation' demo illustrates an additive G-P map and GE measurement process fit to data from a deep mutational scanning (DMS) experiment performed on protein GB1 by Olson et al., 2014.

[2]:
mavenn.run_demo('gb1_ge_evaluation', print_code=False)
Running /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/gb1_ge_evaluation.py...
Using mavenn at: /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn
Model loaded from these files:
        /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/gb1_ge_additive.pickle
        /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/gb1_ge_additive.weights.h5
../_images/tutorials_1_demos_6_1.png
Done!

Visualizing an MPA regression model

The 'sortseq_mpa_visualization' demo illustrates an additive G-P map, along with an MPA measurement process, fit to data from a sort-seq MPRA performed by Kinney et al., 2010.

[3]:
mavenn.run_demo('sortseq_mpa_visualization', print_code=False)
Running /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/sortseq_mpa_visualization.py...
Using mavenn at: /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn
Model loaded from these files:
        /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/sortseq_mpa_additive.pickle
        /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/models/sortseq_mpa_additive.weights.h5
../_images/tutorials_1_demos_9_1.png
Done!

Training a GE regression model

The 'mpsa_ge_training' demo uses GE regression to train a pairwise G-P map on data from a massively parallel splicing assay (MPSA) reported by Wong et al., 2018. This training process usually takes under a minute on a standard laptop.

[4]:
mavenn.run_demo('mpsa_ge_training', print_code=False)
Running /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn/examples/demos/mpsa_ge_training.py...
/opt/miniconda3/envs/test_mavenn/lib/python3.12
Using mavenn at: /opt/miniconda3/envs/test_mavenn/lib/python3.12/site-packages/mavenn
N = 24,405 observations set as training data.
Using 20.0% for validation.
Data shuffled.
Time to set data: 0.204 sec.

LSMR            Least-squares solution of  Ax = b

The matrix A has 19520 rows and 36 columns
damp = 0.00000000000000e+00

atol = 1.00e-06                 conlim = 1.00e+08

btol = 1.00e-06             maxiter =       36


   itn      x(1)       norm r    norm Ar  compatible   LS      norm A   cond A
     0  0.00000e+00  1.393e+02  4.656e+03   1.0e+00  2.4e-01
     1  2.57379e-03  1.269e+02  1.630e+03   9.1e-01  1.6e-01  8.0e+01  1.0e+00
     2  4.93456e-03  1.263e+02  1.312e+03   9.1e-01  6.9e-02  1.5e+02  1.3e+00
     3  4.87793e-03  1.253e+02  6.137e+01   9.0e-01  1.8e-03  2.8e+02  2.3e+00
     4  4.05838e-03  1.253e+02  5.831e+00   9.0e-01  1.6e-04  2.9e+02  2.8e+00
     5  4.01729e-03  1.253e+02  4.711e-01   9.0e-01  1.3e-05  3.0e+02  3.0e+00
     6  4.01160e-03  1.253e+02  1.792e-02   9.0e-01  4.7e-07  3.0e+02  3.0e+00

LSMR finished
The least-squares solution is good enough, given atol
istop =       2    normr = 1.3e+02
    normA = 3.0e+02    normAr = 1.8e-02
itn   =       6    condA = 3.0e+00
    normx = 8.3e-01
     6  4.01160e-03   1.253e+02  1.792e-02
   9.0e-01  4.7e-07   3.0e+02  3.0e+00
Linear regression time: 0.0046 sec
Epoch 1/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 2s 2ms/step - loss: 56.2718 - val_loss: 40.3023 - I_var: 0.1957 - val_I_var: 0.1895
Epoch 2/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 41.2576 - val_loss: 39.5769 - I_var: 0.2210 - val_I_var: 0.2147
Epoch 3/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.7031 - val_loss: 39.7165 - I_var: 0.2049 - val_I_var: 0.2248
Epoch 4/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 40.0464 - val_loss: 39.2197 - I_var: 0.2242 - val_I_var: 0.2291
Epoch 5/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.3629 - val_loss: 38.9295 - I_var: 0.2471 - val_I_var: 0.2370
Epoch 6/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.4129 - val_loss: 38.6362 - I_var: 0.2572 - val_I_var: 0.2727
Epoch 7/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.9599 - val_loss: 38.5292 - I_var: 0.2509 - val_I_var: 0.2489
Epoch 8/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.5122 - val_loss: 38.2378 - I_var: 0.2642 - val_I_var: 0.2743
Epoch 9/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 39.1330 - val_loss: 38.1590 - I_var: 0.2497 - val_I_var: 0.2634
Epoch 10/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.3886 - val_loss: 38.2362 - I_var: 0.2535 - val_I_var: 0.2852
Epoch 11/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.4481 - val_loss: 38.4378 - I_var: 0.2499 - val_I_var: 0.2529
Epoch 12/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.7436 - val_loss: 37.7544 - I_var: 0.2837 - val_I_var: 0.2763
Epoch 13/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.6459 - val_loss: 37.5600 - I_var: 0.2939 - val_I_var: 0.2908
Epoch 14/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 37.5865 - val_loss: 36.4163 - I_var: 0.3371 - val_I_var: 0.3210
Epoch 15/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 37.3624 - val_loss: 37.8088 - I_var: 0.2952 - val_I_var: 0.2838
Epoch 16/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 38.1212 - val_loss: 35.6450 - I_var: 0.3327 - val_I_var: 0.3371
Epoch 17/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.5077 - val_loss: 35.6745 - I_var: 0.3541 - val_I_var: 0.3652
Epoch 18/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.7335 - val_loss: 36.4800 - I_var: 0.3292 - val_I_var: 0.3116
Epoch 19/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.7361 - val_loss: 35.8878 - I_var: 0.3432 - val_I_var: 0.3403
Epoch 20/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.3855 - val_loss: 36.5340 - I_var: 0.3123 - val_I_var: 0.2978
Epoch 21/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 35.5454 - val_loss: 35.5232 - I_var: 0.3526 - val_I_var: 0.3682
Epoch 22/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.0334 - val_loss: 36.3785 - I_var: 0.3480 - val_I_var: 0.3283
Epoch 23/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.4779 - val_loss: 35.6944 - I_var: 0.3606 - val_I_var: 0.3458
Epoch 24/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 35.6235 - val_loss: 35.5917 - I_var: 0.3597 - val_I_var: 0.3541
Epoch 25/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.3092 - val_loss: 35.5467 - I_var: 0.3431 - val_I_var: 0.3630
Epoch 26/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.4948 - val_loss: 35.6104 - I_var: 0.3516 - val_I_var: 0.3372
Epoch 27/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.7024 - val_loss: 35.5499 - I_var: 0.3532 - val_I_var: 0.3494
Epoch 28/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 35.9750 - val_loss: 35.6912 - I_var: 0.3507 - val_I_var: 0.3447
Epoch 29/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.1810 - val_loss: 35.2952 - I_var: 0.3642 - val_I_var: 0.3612
Epoch 30/30
391/391 ━━━━━━━━━━━━━━━━━━━━ 1s 2ms/step - loss: 36.1622 - val_loss: 35.3430 - I_var: 0.3614 - val_I_var: 0.3634
Training time: 22.7 seconds
I_var_test: 0.327 +- 0.027 bits
I_pred_test: 0.359 +- 0.014 bits
../_images/tutorials_1_demos_12_1.png
Done!

References

  1. Kinney J, Murugan A, Callan C, Cox E. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc Natl Acad Sci USA. 107:9158-9163 (2010).

  2. Olson CA, Wu NC, Sun R. A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr Biol 24:2643–2651 (2014).

  3. Wong M, Kinney J, Krainer A. Quantitative activity profile and context dependence of all human 5’ splice sites. Mol Cell 71:1012-1026.e3 (2018).

[ ]: