RNA Only Workflow

This is the most standard SCRIPro computation process, requiring only the input of the corresponding scRNA-seq sequencing matrix. To demonstrate SCRIP’s ability to be applied to different tissue types and infer target genes for TRs, we applied SCRIP to 10X lymphoma sequencing data. Data are available on https://www.10xgenomics.com/datasets/fresh-frozen-lymph-node-with-b-cell-lymphoma-14-k-sorted-nuclei-1-standard-2-0

Using Shell:

The resulting tf_score matrix can be obtained by using the following shell statement:

scripro enrich -i ./data/rna/rna.h5ad -n 50 -s hs -p rna_workflow -t 32

The resulting gata3_score matrix can be obtained through the following shell statement, where rna_workflow.pkl is the result of SCRIPro enrich:

scripro get_tf_target -i rna_workflow.pkl -t GATA3 -p GATA3_target

Using Python for custom analysis:

import numpy as np
import pandas as pd
import scanpy as sc
import h5py
import scripro
import anndata
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")

Load and preprocess data

rna = sc.read_h5ad('./data/rna/rna.h5ad')
rna.var_names_make_unique()
rna.raw = rna
sc.pp.normalize_total(rna, target_sum=1e4)
sc.pp.log1p(rna)
sc.pp.highly_variable_genes(rna, min_mean=0.0125, max_mean=3, min_disp=0.5)
rna = rna[:, rna.var.highly_variable]
sc.pp.scale(rna, max_value=10)
sc.tl.pca(rna, svd_solver='arpack')
sc.pp.neighbors(rna)
sc.tl.umap(rna)
sc.tl.leiden(rna)

Calculate the metacell and the marker genes corresponding to metacell.

test_data = scripro.Ori_Data(rna,Cell_num=50)

ad_all is the integrated counting matrix.

test_data.ad_all
MIR1302-2HG FAM138A OR4F5 AL627309.1 AL627309.3 AL627309.2 AL627309.5 AL627309.4 AP006222.2 AL732372.1 ... AC133551.1 AC136612.1 AC136616.1 AC136616.3 AC136616.2 AC141272.1 AC023491.2 AC007325.1 AC007325.4 AC007325.2
20_0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15_0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
15_2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
13_0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9_4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9_5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9_6 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
9_7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
21_0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

224 rows × 36621 columns

test_data.get_positive_marker_gene_parallel()
rna_seq_data = scripro.SCRIPro_RNA(5,'hg38',test_data,assays=['Direct','DNase','H3K27ac'])

The computational process of In Silico Deletion

rna_seq_data.cal_ISD_cistrome()

The P-value matrix of each metacell LISA is obtained according to the calculation results

Get TF activity Score

rna_seq_data.get_tf_score()
rna_seq_data.P_value_matrix
ADNP AFF1 AFF4 AGO1 AHR AIRE ALX1 ALX3 ALX4 ANHX ... ZSCAN22 ZSCAN23 ZSCAN29 ZSCAN30 ZSCAN31 ZSCAN4 ZSCAN5A ZSCAN5C ZXDB ZXDC
row
0_0 1.982159e-05 0.114342 0.466165 3.044442e-03 0.065143 0.116164 0.261117 0.090598 0.043649 0.070920 ... 0.001946 1.034024e-03 0.000837 0.023628 0.187771 0.130556 0.000345 0.072917 9.929228e-07 1.078112e-06
0_1 1.078489e-03 0.045135 0.541748 4.741197e-02 0.172083 0.137448 0.120097 0.091863 0.078125 0.097334 ... 0.027452 6.524492e-02 0.119130 0.071906 0.200513 0.117636 0.007210 0.072906 1.114402e-05 3.193426e-03
0_10 1.945398e-04 0.150389 0.350183 7.688059e-02 0.089623 0.316572 0.277354 0.399970 0.437044 0.195209 ... 0.021498 1.736244e-03 0.091324 0.003618 0.320272 0.071882 0.000904 0.098806 2.213682e-06 1.677967e-02
0_11 9.016532e-02 0.124475 0.635978 2.211520e-02 0.178290 0.010232 0.077026 0.126848 0.065793 0.001066 ... 0.211864 4.717477e-02 0.126473 0.111667 0.130438 0.169036 0.055158 0.244485 4.748398e-04 1.358551e-02
0_12 1.508612e-01 0.220131 0.714978 1.149924e-01 0.166783 0.000201 0.019816 0.003010 0.003320 0.003520 ... 0.349635 1.420289e-01 0.171647 0.123673 0.080900 0.042576 0.047124 0.017884 1.611482e-01 2.017362e-01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9_3 1.481955e-05 0.161472 0.000004 6.475927e-07 0.004738 0.093825 0.145126 0.158836 0.204868 0.006100 ... 0.000030 6.431066e-08 0.041991 0.001208 0.000560 0.012364 0.000022 0.036678 5.952748e-08 2.198499e-08
9_4 1.624109e-07 0.304159 0.185860 1.608332e-02 0.018612 0.205191 0.173053 0.138393 0.167866 0.051846 ... 0.006800 1.012524e-04 0.031388 0.001566 0.097648 0.044065 0.000073 0.019923 1.451613e-03 7.308369e-03
9_5 1.541161e-06 0.252129 0.000368 4.775720e-04 0.036822 0.136602 0.147106 0.204738 0.165820 0.031218 ... 0.015975 1.854799e-03 0.069004 0.008719 0.092146 0.088071 0.000901 0.005200 1.631952e-04 3.722424e-05
9_6 6.143819e-05 0.349253 0.150809 3.164199e-02 0.089277 0.122468 0.182552 0.158537 0.181882 0.090961 ... 0.012562 5.747627e-03 0.085607 0.011577 0.090943 0.081455 0.004634 0.016923 3.773492e-03 5.942802e-02
9_7 6.450485e-04 0.390047 0.199128 1.675784e-02 0.132506 0.096528 0.102888 0.107414 0.135996 0.100875 ... 0.016645 9.027264e-03 0.067132 0.021804 0.122074 0.053077 0.000223 0.008073 8.117502e-03 7.536773e-03

224 rows × 1226 columns

The corresponding RP score and expression value are used to weight the P-value obtained, and the final tf activity score is obtained

rna_seq_data.tf_score
ADNP AFF1 AFF4 AGO1 AHR AIRE ALX1 ALX3 ALX4 ANHX ... ZSCAN22 ZSCAN23 ZSCAN29 ZSCAN30 ZSCAN31 ZSCAN4 ZSCAN5A ZSCAN5C ZXDB ZXDC
row
0_0 1.181346e-05 0.060435 0.307493 1.462677e-04 0.026594 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.000038 0.001076 0.0 0.0 8.489554e-06 0.0 4.385504e-08 6.365249e-07
0_1 8.153228e-04 0.028895 0.455855 2.507957e-03 0.008484 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.006283 0.005021 0.0 0.0 3.823604e-03 0.0 5.890852e-07 1.917075e-03
0_10 1.138860e-04 0.095834 0.293383 3.976904e-02 0.037968 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.001849 0.000498 0.0 0.0 6.564053e-05 0.0 1.429377e-07 9.996831e-03
0_11 6.903511e-02 0.076661 0.422427 1.190686e-03 0.011600 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.006117 0.008547 0.0 0.0 4.512259e-03 0.0 3.173963e-05 8.298006e-03
0_12 8.898146e-02 0.136908 0.467959 5.825133e-02 0.009677 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.008715 0.010824 0.0 0.0 1.991192e-03 0.0 9.687363e-03 1.161664e-01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9_3 8.815053e-06 0.060870 0.000002 4.206756e-08 0.001850 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.000759 0.000037 0.0 0.0 6.775830e-07 0.0 2.239746e-09 1.277747e-08
9_4 1.216592e-07 0.160054 0.121961 9.350271e-04 0.007390 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.030882 0.000199 0.0 0.0 2.069416e-06 0.0 6.039253e-05 9.517900e-05
9_5 1.182939e-06 0.095240 0.000301 2.557181e-05 0.002016 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.002766 0.000738 0.0 0.0 4.742670e-04 0.0 6.393928e-06 2.261003e-05
9_6 3.662379e-05 0.186604 0.140389 2.558394e-03 0.038996 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.083027 0.000672 0.0 0.0 3.029751e-04 0.0 1.696993e-04 3.468848e-02
9_7 4.759191e-04 0.254694 0.129037 9.390380e-04 0.009117 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.063992 0.002329 0.0 0.0 1.695396e-05 0.0 8.699077e-04 4.172868e-03

224 rows × 1226 columns

Calculate the downstream target gene of each TF in each metacell

gata3_score = rna_seq_data.get_tf_target('GATA3')
gata3_score
SOS1 ZNF487 PPP1CA CFLAR WDR37 CTLA4 STK10 NFKBIL1 INO80B PPP2R5C ... BCL2 RPL18 PRSS55 UBL4B FAM13A WDR20 SYTL3 ASH1L APOC3 CPNE8
3_10 0.012644 0.000000 0.000000 0.000000 0.096325 0.000000 0.026573 0.000000 0.067059 0.021823 ... 0 0 0 0 0 0 0 0 0 0
4_1 0.239298 0.025236 0.000000 0.111141 0.000000 0.133851 0.000000 0.000000 0.000000 0.077034 ... 0 0 0 0 0 0 0 0 0 0
1_0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
12_4 0.000000 0.000000 0.120566 0.209552 0.093906 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
10_4 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
22_0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
1_17 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
0_24 0.095861 0.000000 0.000000 0.051342 0.000000 0.000000 0.012070 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0
0_3 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.027181 0.220751 0.000000 0.030307 ... 0 0 0 0 0 0 0 0 0 0
3_2 0.000000 0.000000 0.000000 0.000000 0.009417 0.045787 0.115108 0.000000 0.000000 0.000000 ... 0 0 0 0 0 0 0 0 0 0

224 rows × 3084 columns