RNA Only Workflow¶
This is the most standard SCRIPro computation process, requiring only the input of the corresponding scRNA-seq sequencing matrix. To demonstrate SCRIP’s ability to be applied to different tissue types and infer target genes for TRs, we applied SCRIP to 10X lymphoma sequencing data. Data are available on https://www.10xgenomics.com/datasets/fresh-frozen-lymph-node-with-b-cell-lymphoma-14-k-sorted-nuclei-1-standard-2-0
Using Shell:¶
The resulting tf_score matrix can be obtained by using the following shell statement:
scripro enrich -i ./data/rna/rna.h5ad -n 50 -s hs -p rna_workflow -t 32
The resulting gata3_score matrix can be obtained through the following shell statement, where rna_workflow.pkl is the result of SCRIPro enrich:
scripro get_tf_target -i rna_workflow.pkl -t GATA3 -p GATA3_target
Using Python for custom analysis:¶
import numpy as np
import pandas as pd
import scanpy as sc
import h5py
import scripro
import anndata
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")
Load and preprocess data¶
rna = sc.read_h5ad('./data/rna/rna.h5ad')
rna.var_names_make_unique()
rna.raw = rna
sc.pp.normalize_total(rna, target_sum=1e4)
sc.pp.log1p(rna)
sc.pp.highly_variable_genes(rna, min_mean=0.0125, max_mean=3, min_disp=0.5)
rna = rna[:, rna.var.highly_variable]
sc.pp.scale(rna, max_value=10)
sc.tl.pca(rna, svd_solver='arpack')
sc.pp.neighbors(rna)
sc.tl.umap(rna)
sc.tl.leiden(rna)
Calculate the metacell and the marker genes corresponding to metacell.
test_data = scripro.Ori_Data(rna,Cell_num=50)
ad_all is the integrated counting matrix.
test_data.ad_all
| MIR1302-2HG | FAM138A | OR4F5 | AL627309.1 | AL627309.3 | AL627309.2 | AL627309.5 | AL627309.4 | AP006222.2 | AL732372.1 | ... | AC133551.1 | AC136612.1 | AC136616.1 | AC136616.3 | AC136616.2 | AC141272.1 | AC023491.2 | AC007325.1 | AC007325.4 | AC007325.2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20_0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 15_0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 15_1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 15_2 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 13_0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9_4 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 9_5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 9_6 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 9_7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 21_0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
224 rows × 36621 columns
test_data.get_positive_marker_gene_parallel()
rna_seq_data = scripro.SCRIPro_RNA(5,'hg38',test_data,assays=['Direct','DNase','H3K27ac'])
The computational process of In Silico Deletion¶
rna_seq_data.cal_ISD_cistrome()
The P-value matrix of each metacell LISA is obtained according to the calculation results
Get TF activity Score¶
rna_seq_data.get_tf_score()
rna_seq_data.P_value_matrix
| ADNP | AFF1 | AFF4 | AGO1 | AHR | AIRE | ALX1 | ALX3 | ALX4 | ANHX | ... | ZSCAN22 | ZSCAN23 | ZSCAN29 | ZSCAN30 | ZSCAN31 | ZSCAN4 | ZSCAN5A | ZSCAN5C | ZXDB | ZXDC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| row | |||||||||||||||||||||
| 0_0 | 1.982159e-05 | 0.114342 | 0.466165 | 3.044442e-03 | 0.065143 | 0.116164 | 0.261117 | 0.090598 | 0.043649 | 0.070920 | ... | 0.001946 | 1.034024e-03 | 0.000837 | 0.023628 | 0.187771 | 0.130556 | 0.000345 | 0.072917 | 9.929228e-07 | 1.078112e-06 |
| 0_1 | 1.078489e-03 | 0.045135 | 0.541748 | 4.741197e-02 | 0.172083 | 0.137448 | 0.120097 | 0.091863 | 0.078125 | 0.097334 | ... | 0.027452 | 6.524492e-02 | 0.119130 | 0.071906 | 0.200513 | 0.117636 | 0.007210 | 0.072906 | 1.114402e-05 | 3.193426e-03 |
| 0_10 | 1.945398e-04 | 0.150389 | 0.350183 | 7.688059e-02 | 0.089623 | 0.316572 | 0.277354 | 0.399970 | 0.437044 | 0.195209 | ... | 0.021498 | 1.736244e-03 | 0.091324 | 0.003618 | 0.320272 | 0.071882 | 0.000904 | 0.098806 | 2.213682e-06 | 1.677967e-02 |
| 0_11 | 9.016532e-02 | 0.124475 | 0.635978 | 2.211520e-02 | 0.178290 | 0.010232 | 0.077026 | 0.126848 | 0.065793 | 0.001066 | ... | 0.211864 | 4.717477e-02 | 0.126473 | 0.111667 | 0.130438 | 0.169036 | 0.055158 | 0.244485 | 4.748398e-04 | 1.358551e-02 |
| 0_12 | 1.508612e-01 | 0.220131 | 0.714978 | 1.149924e-01 | 0.166783 | 0.000201 | 0.019816 | 0.003010 | 0.003320 | 0.003520 | ... | 0.349635 | 1.420289e-01 | 0.171647 | 0.123673 | 0.080900 | 0.042576 | 0.047124 | 0.017884 | 1.611482e-01 | 2.017362e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9_3 | 1.481955e-05 | 0.161472 | 0.000004 | 6.475927e-07 | 0.004738 | 0.093825 | 0.145126 | 0.158836 | 0.204868 | 0.006100 | ... | 0.000030 | 6.431066e-08 | 0.041991 | 0.001208 | 0.000560 | 0.012364 | 0.000022 | 0.036678 | 5.952748e-08 | 2.198499e-08 |
| 9_4 | 1.624109e-07 | 0.304159 | 0.185860 | 1.608332e-02 | 0.018612 | 0.205191 | 0.173053 | 0.138393 | 0.167866 | 0.051846 | ... | 0.006800 | 1.012524e-04 | 0.031388 | 0.001566 | 0.097648 | 0.044065 | 0.000073 | 0.019923 | 1.451613e-03 | 7.308369e-03 |
| 9_5 | 1.541161e-06 | 0.252129 | 0.000368 | 4.775720e-04 | 0.036822 | 0.136602 | 0.147106 | 0.204738 | 0.165820 | 0.031218 | ... | 0.015975 | 1.854799e-03 | 0.069004 | 0.008719 | 0.092146 | 0.088071 | 0.000901 | 0.005200 | 1.631952e-04 | 3.722424e-05 |
| 9_6 | 6.143819e-05 | 0.349253 | 0.150809 | 3.164199e-02 | 0.089277 | 0.122468 | 0.182552 | 0.158537 | 0.181882 | 0.090961 | ... | 0.012562 | 5.747627e-03 | 0.085607 | 0.011577 | 0.090943 | 0.081455 | 0.004634 | 0.016923 | 3.773492e-03 | 5.942802e-02 |
| 9_7 | 6.450485e-04 | 0.390047 | 0.199128 | 1.675784e-02 | 0.132506 | 0.096528 | 0.102888 | 0.107414 | 0.135996 | 0.100875 | ... | 0.016645 | 9.027264e-03 | 0.067132 | 0.021804 | 0.122074 | 0.053077 | 0.000223 | 0.008073 | 8.117502e-03 | 7.536773e-03 |
224 rows × 1226 columns
The corresponding RP score and expression value are used to weight the P-value obtained, and the final tf activity score is obtained
rna_seq_data.tf_score
| ADNP | AFF1 | AFF4 | AGO1 | AHR | AIRE | ALX1 | ALX3 | ALX4 | ANHX | ... | ZSCAN22 | ZSCAN23 | ZSCAN29 | ZSCAN30 | ZSCAN31 | ZSCAN4 | ZSCAN5A | ZSCAN5C | ZXDB | ZXDC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| row | |||||||||||||||||||||
| 0_0 | 1.181346e-05 | 0.060435 | 0.307493 | 1.462677e-04 | 0.026594 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000038 | 0.001076 | 0.0 | 0.0 | 8.489554e-06 | 0.0 | 4.385504e-08 | 6.365249e-07 |
| 0_1 | 8.153228e-04 | 0.028895 | 0.455855 | 2.507957e-03 | 0.008484 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.006283 | 0.005021 | 0.0 | 0.0 | 3.823604e-03 | 0.0 | 5.890852e-07 | 1.917075e-03 |
| 0_10 | 1.138860e-04 | 0.095834 | 0.293383 | 3.976904e-02 | 0.037968 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.001849 | 0.000498 | 0.0 | 0.0 | 6.564053e-05 | 0.0 | 1.429377e-07 | 9.996831e-03 |
| 0_11 | 6.903511e-02 | 0.076661 | 0.422427 | 1.190686e-03 | 0.011600 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.006117 | 0.008547 | 0.0 | 0.0 | 4.512259e-03 | 0.0 | 3.173963e-05 | 8.298006e-03 |
| 0_12 | 8.898146e-02 | 0.136908 | 0.467959 | 5.825133e-02 | 0.009677 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.008715 | 0.010824 | 0.0 | 0.0 | 1.991192e-03 | 0.0 | 9.687363e-03 | 1.161664e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 9_3 | 8.815053e-06 | 0.060870 | 0.000002 | 4.206756e-08 | 0.001850 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.000759 | 0.000037 | 0.0 | 0.0 | 6.775830e-07 | 0.0 | 2.239746e-09 | 1.277747e-08 |
| 9_4 | 1.216592e-07 | 0.160054 | 0.121961 | 9.350271e-04 | 0.007390 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.030882 | 0.000199 | 0.0 | 0.0 | 2.069416e-06 | 0.0 | 6.039253e-05 | 9.517900e-05 |
| 9_5 | 1.182939e-06 | 0.095240 | 0.000301 | 2.557181e-05 | 0.002016 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.002766 | 0.000738 | 0.0 | 0.0 | 4.742670e-04 | 0.0 | 6.393928e-06 | 2.261003e-05 |
| 9_6 | 3.662379e-05 | 0.186604 | 0.140389 | 2.558394e-03 | 0.038996 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.083027 | 0.000672 | 0.0 | 0.0 | 3.029751e-04 | 0.0 | 1.696993e-04 | 3.468848e-02 |
| 9_7 | 4.759191e-04 | 0.254694 | 0.129037 | 9.390380e-04 | 0.009117 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.0 | 0.063992 | 0.002329 | 0.0 | 0.0 | 1.695396e-05 | 0.0 | 8.699077e-04 | 4.172868e-03 |
224 rows × 1226 columns
Calculate the downstream target gene of each TF in each metacell¶
gata3_score = rna_seq_data.get_tf_target('GATA3')
gata3_score
| SOS1 | ZNF487 | PPP1CA | CFLAR | WDR37 | CTLA4 | STK10 | NFKBIL1 | INO80B | PPP2R5C | ... | BCL2 | RPL18 | PRSS55 | UBL4B | FAM13A | WDR20 | SYTL3 | ASH1L | APOC3 | CPNE8 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3_10 | 0.012644 | 0.000000 | 0.000000 | 0.000000 | 0.096325 | 0.000000 | 0.026573 | 0.000000 | 0.067059 | 0.021823 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4_1 | 0.239298 | 0.025236 | 0.000000 | 0.111141 | 0.000000 | 0.133851 | 0.000000 | 0.000000 | 0.000000 | 0.077034 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1_0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 12_4 | 0.000000 | 0.000000 | 0.120566 | 0.209552 | 0.093906 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 10_4 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 22_0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1_17 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0_24 | 0.095861 | 0.000000 | 0.000000 | 0.051342 | 0.000000 | 0.000000 | 0.012070 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 0_3 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.027181 | 0.220751 | 0.000000 | 0.030307 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3_2 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.009417 | 0.045787 | 0.115108 | 0.000000 | 0.000000 | 0.000000 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
224 rows × 3084 columns