Persad et al & Littman et al Comparison¶
To test SCRIPro’s ability to explore the intensity of transcription factor regulation, we applied it to a CRISPRa activated T cell single-cell sequencing data set to calculate AUROC and AUPRC using the activated Label per cell as the gold standard,and this data can be downloaded from http://www.perturbase.cn/download (PRJNA787633).
In this notebook, we use the cell clustering results obtained by Persad et al to replace the Littman et al clustering results and test Persad et al ability to extract heterogeneous cells.
Using Shell:¶
Transcription factor enrichment scores can be obtained by SCRIPro using the following shell statement:
SCRIPro enrich -i ./data/raw.h5ad -n 50 -s hs -p rna_workflow -t 32
Using Python for custom analysis:¶
import numpy as np
import pandas as pd
import scanpy as sc
import h5py
import scripro
import anndata
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")
from sklearn.metrics import roc_curve, auc
Load and process data¶
Load data and raw data. Since the processed data downloaded does not contain raw data, we download raw data for re-processing
rna = sc.read_h5ad('./data/perturb_data.h5ad')
rna_raw = sc.read_h5ad('./data/raw.h5ad')
rna_raw=rna_raw[rna.obs.index]
sc.pp.normalize_total(rna_raw, target_sum=1e4)
sc.pp.log1p(rna_raw)
rna.raw = rna_raw
test_data = scripro.Ori_Data(rna,Cell_num=50)
rna.obs
| gene | n_genes | n_genes_by_counts | total_counts | total_counts_mt | pct_counts_mt | leiden | mixscape_class_p_ko | mixscape_class | mixscape_class_global | pertclass | hdbscan | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cell_barcodes | ||||||||||||
| TAACCAGAGTAGAATC-8 | TRIM21 | 3467 | 3467 | 10422.0 | 755.0 | 7.244291 | 26 | 1.0 | TRIM21 KO | KO | strong | 9 |
| CATAGACCAACACGAG-8 | CBY1 | 2003 | 2003 | 4621.0 | 392.0 | 8.483012 | 24 | 1.0 | CBY1 KO | KO | strong | 10 |
| CTGTGAATCCGGTAAT-2 | LAT2 | 4344 | 4344 | 16784.0 | 1412.0 | 8.412774 | 9 | 1.0 | LAT2 KO | KO | strong | 1 |
| GAGCTGCAGGTAGATT-8 | RELA | 2361 | 2360 | 6086.0 | 380.0 | 6.243838 | 16 | 1.0 | RELA KO | KO | strong | 4 |
| AAGTACCCAACTTCTT-3 | WT1 | 2198 | 2198 | 5469.0 | 545.0 | 9.965259 | 12 | 1.0 | WT1 KO | KO | strong | 0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| GGCTGTGAGGGCTAAC-5 | APOL2 | 2503 | 2503 | 6126.0 | 553.0 | 9.027098 | 30 | 1.0 | APOL2 KO | KO | strong | 20 |
| ATATCCTCATCATTTC-8 | TNFRSF1B | 4380 | 4379 | 14271.0 | 1052.0 | 7.371593 | 11 | 1.0 | TNFRSF1B KO | KO | strong | 7 |
| CTAGGTAGTTGAGGAC-1 | CD27 | 2385 | 2385 | 6731.0 | 336.0 | 4.991829 | 19 | 1.0 | CD27 KO | KO | strong | 17 |
| TGGGAAGGTGAGTTTC-6 | CTRL | 2988 | 2988 | 7500.0 | 541.0 | 7.213333 | 3 | 0.0 | CTRL | CTRL | CTRL | 30 |
| ATGATCGGTATCGTTG-7 | RELA | 2739 | 2739 | 7606.0 | 311.0 | 4.088877 | 16 | 1.0 | RELA KO | KO | strong | 4 |
16707 rows × 12 columns
replace Littman et al data with Persad et al data¶
Load the Persad et al data calculated by Persad et al, then replace Littman et al data with Persad et al data (new_leiden column)
rawmeta = pd.read_csv('./rawmeta.csv')
rawmeta
| Cell_barcodes | gene | excluded_umis | metacell | dissolved | metacell_level | cells_rare_gene_module | rare_cell | metacell_name | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | TAACCAGAGTAGAATC-8 | TRIM21 | 849.0 | 110 | False | 1 | -1 | False | M110.11 |
| 1 | CATAGACCAACACGAG-8 | CBY1 | 459.0 | 91 | False | 1 | -1 | False | M91.30 |
| 2 | CTGTGAATCCGGTAAT-2 | LAT2 | 1571.0 | 484 | False | 2 | -1 | False | M484.07 |
| 3 | GAGCTGCAGGTAGATT-8 | RELA | 408.0 | 131 | False | 1 | -1 | False | M131.58 |
| 4 | AAGTACCCAACTTCTT-3 | WT1 | 779.0 | 412 | False | 1 | -1 | False | M412.96 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 16702 | GGCTGTGAGGGCTAAC-5 | APOL2 | 712.0 | 219 | False | 1 | -1 | False | M219.04 |
| 16703 | ATATCCTCATCATTTC-8 | TNFRSF1B | 1107.0 | 5 | False | 1 | -1 | False | M5.31 |
| 16704 | CTAGGTAGTTGAGGAC-1 | CD27 | 445.0 | 235 | False | 1 | -1 | False | M235.05 |
| 16705 | TGGGAAGGTGAGTTTC-6 | CTRL | 650.0 | 100 | False | 1 | -1 | False | M100.73 |
| 16706 | ATGATCGGTATCGTTG-7 | RELA | 332.0 | 470 | False | 2 | -1 | False | M470.82 |
16707 rows × 9 columns
test_data.adata.obs.new_leiden=list(rawmeta.metacell)
test_data.adata.obs
| gene | n_genes | n_genes_by_counts | total_counts | total_counts_mt | pct_counts_mt | leiden | mixscape_class_p_ko | mixscape_class | mixscape_class_global | pertclass | hdbscan | new_leiden | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cell_barcodes | |||||||||||||
| TAACCAGAGTAGAATC-8 | TRIM21 | 3467 | 3467 | 10422.0 | 755.0 | 7.244291 | 26 | 1.0 | TRIM21 KO | KO | strong | 9 | 110 |
| CATAGACCAACACGAG-8 | CBY1 | 2003 | 2003 | 4621.0 | 392.0 | 8.483012 | 24 | 1.0 | CBY1 KO | KO | strong | 10 | 91 |
| CTGTGAATCCGGTAAT-2 | LAT2 | 4344 | 4344 | 16784.0 | 1412.0 | 8.412774 | 9 | 1.0 | LAT2 KO | KO | strong | 1 | 484 |
| GAGCTGCAGGTAGATT-8 | RELA | 2361 | 2360 | 6086.0 | 380.0 | 6.243838 | 16 | 1.0 | RELA KO | KO | strong | 4 | 131 |
| AAGTACCCAACTTCTT-3 | WT1 | 2198 | 2198 | 5469.0 | 545.0 | 9.965259 | 12 | 1.0 | WT1 KO | KO | strong | 0 | 412 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| GGCTGTGAGGGCTAAC-5 | APOL2 | 2503 | 2503 | 6126.0 | 553.0 | 9.027098 | 30 | 1.0 | APOL2 KO | KO | strong | 20 | 219 |
| ATATCCTCATCATTTC-8 | TNFRSF1B | 4380 | 4379 | 14271.0 | 1052.0 | 7.371593 | 11 | 1.0 | TNFRSF1B KO | KO | strong | 7 | 5 |
| CTAGGTAGTTGAGGAC-1 | CD27 | 2385 | 2385 | 6731.0 | 336.0 | 4.991829 | 19 | 1.0 | CD27 KO | KO | strong | 17 | 235 |
| TGGGAAGGTGAGTTTC-6 | CTRL | 2988 | 2988 | 7500.0 | 541.0 | 7.213333 | 3 | 0.0 | CTRL | CTRL | CTRL | 30 | 100 |
| ATGATCGGTATCGTTG-7 | RELA | 2739 | 2739 | 7606.0 | 311.0 | 4.088877 | 16 | 1.0 | RELA KO | KO | strong | 4 | 470 |
16707 rows × 13 columns
test_data.adata.obs['new_leiden'] = test_data.adata.obs['new_leiden'].astype(str)
test_data.get_positive_marker_gene_parallel()
rna_seq_data = scripro.SCRIPro_RNA(12,'hg38',test_data,assays=['Direct','DNase','H3K27ac'])
Calculating ISD¶
rna_seq_data.cal_ISD_cistrome()
rna_seq_data.P_value_matrix
| factor | NELFA | SUPT5H | POLR2A | TAF1 | E2F1 | MYC | JMJD6 | TFDP1 | PHF8 | BRD4 | ... | ESCO2 | SOX8 | WWTR1 | ELF5 | ZIC3 | SOX6 | HOXA1 | TOP1 | FOXE3 | ETV2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 54 | 1.000000 | 0.879108 | 0.790975 | 0.787378 | 0.787086 | 0.759230 | 0.759144 | 0.744869 | 0.735694 | 0.734767 | ... | 3.128389e-11 | 2.321808e-11 | 1.636090e-11 | 9.684857e-12 | 6.359623e-12 | 4.229628e-12 | 3.758939e-12 | 9.764622e-13 | 7.729002e-13 | 0.000000e+00 |
| 568 | 0.596106 | 0.658273 | 0.878588 | 0.637187 | 0.542015 | 0.772881 | 0.482539 | 0.328786 | 0.485381 | 0.840747 | ... | 5.529090e-02 | 2.201403e-02 | 6.034951e-02 | 1.250646e-01 | 9.094418e-02 | 2.176096e-05 | 1.033788e-01 | 1.759315e-02 | 8.293918e-02 | 1.189246e-01 |
| 171 | 0.956295 | 0.948642 | 0.807200 | 0.779516 | 0.645101 | 1.000000 | 0.747362 | 0.417036 | 0.685183 | 0.772588 | ... | 0.000000e+00 | 9.707026e-07 | 4.524251e-08 | 1.419998e-10 | 2.228434e-07 | 2.616552e-10 | 5.366166e-07 | 0.000000e+00 | 1.553591e-07 | 1.790557e-10 |
| 106 | 1.000000 | 0.938330 | 0.849668 | 0.820421 | 0.634343 | 0.915763 | 0.775252 | 0.401171 | 0.737218 | 0.841678 | ... | 0.000000e+00 | 1.369131e-06 | 2.459007e-08 | 6.081526e-07 | 2.266960e-05 | 6.717899e-10 | 2.948945e-04 | 1.449065e-13 | 1.059525e-07 | 4.508395e-09 |
| 79 | 0.037224 | 0.413154 | 1.000000 | 0.553018 | 0.133872 | 0.551101 | 0.076116 | 0.042632 | 0.125038 | 0.670272 | ... | 6.291131e-04 | 1.136219e-01 | 7.055724e-02 | 1.315009e-01 | 5.778941e-02 | 7.625599e-02 | 1.515579e-01 | 1.035683e-02 | 1.189053e-01 | 2.294130e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 337 | 0.628819 | 0.884519 | 0.881672 | 1.000000 | 1.000000 | 0.738973 | 0.544195 | 0.935860 | 0.466437 | 0.784920 | ... | 2.697677e-06 | 5.354145e-03 | 1.452867e-02 | 1.063817e-03 | 2.141516e-05 | 5.709512e-02 | 2.659353e-02 | 3.680871e-05 | 5.829138e-02 | 4.284606e-02 |
| 222 | 0.912849 | 0.800940 | 0.908895 | 0.974336 | 0.590447 | 0.717522 | 0.606849 | 0.280193 | 0.566149 | 0.913928 | ... | 2.674771e-03 | 1.304035e-02 | 1.189110e-01 | 6.780617e-02 | 3.949164e-03 | 5.605106e-02 | 3.067985e-02 | 5.394981e-02 | 1.673135e-01 | 3.413787e-02 |
| 304 | 0.839100 | 0.804796 | 0.861108 | 0.937081 | 1.000000 | 0.718710 | 0.672186 | 0.722735 | 0.624386 | 0.913212 | ... | 2.928011e-02 | 1.334770e-03 | 2.150550e-02 | 1.245458e-02 | 1.675397e-04 | 7.373146e-03 | 6.442231e-03 | 1.534435e-02 | 5.141218e-02 | 1.081799e-02 |
| 375 | 0.145227 | 0.599807 | 1.000000 | 0.570559 | 0.201262 | 0.653092 | 0.182292 | 0.047675 | 0.259976 | 0.752675 | ... | 5.605888e-04 | 3.816154e-02 | 3.498265e-02 | 1.408408e-01 | 2.847721e-02 | 7.229822e-02 | 9.771098e-02 | 4.832158e-03 | 5.258527e-02 | 2.155378e-01 |
| 220 | 0.991618 | 0.938332 | 0.838142 | 0.710450 | 0.641620 | 1.000000 | 0.719913 | 0.371703 | 0.608496 | 0.759212 | ... | 1.202406e-14 | 3.684495e-06 | 4.245327e-07 | 5.762894e-07 | 4.323174e-08 | 3.048350e-11 | 7.881787e-09 | 6.787328e-10 | 6.295932e-06 | 1.545003e-10 |
592 rows × 1252 columns
rna_seq_data.get_tf_score()
tem_exp = rna_raw.to_df().merge(test_data.adata.obs.loc[:,'new_leiden'],left_index=True,right_index=True)
grouped = tem_exp.groupby('new_leiden').mean()
grouped
| MIR1302-2HG | FAM138A | OR4F5 | AL627309.1 | AL627309.3 | AL627309.2 | AL627309.5 | AL627309.4 | AP006222.2 | AL732372.1 | ... | TNFRSF9-1 | TNFRSF9-2 | TRAF3IP2-1 | TRAF3IP2-2 | TRIM21-1 | TRIM21-2 | VAV1-1 | VAV1-2 | WT1-1 | WT1-2 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| new_leiden | |||||||||||||||||||||
| 0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.022986 | 0.020883 | 0.141023 | 0.019764 | 0.070329 | 0.045294 | 0.000000 | 0.000000 | 5.971172 | 0.025028 |
| 1 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.028059 | 0.000000 | 0.0 | 0.0 | ... | 0.307047 | 0.000000 | 0.175104 | 0.078949 | 0.000000 | 0.000000 | 0.000000 | 0.031357 | 3.475705 | 0.000000 |
| 2 | 0.0 | 0.0 | 0.0 | 0.029139 | 0.000000 | 0.0 | 0.061611 | 0.029139 | 0.0 | 0.0 | ... | 0.000000 | 0.000000 | 0.145290 | 0.000000 | 0.025556 | 0.000000 | 0.000000 | 0.000000 | 0.082464 | 0.000000 |
| 3 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.030975 | 0.000000 | 0.0 | 0.0 | ... | 0.025993 | 0.000000 | 0.171809 | 0.099170 | 0.072312 | 0.000000 | 0.039627 | 0.000000 | 0.352380 | 0.000000 |
| 4 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.035995 | 0.000000 | 0.0 | 0.0 | ... | 0.032857 | 0.000000 | 0.288488 | 0.000000 | 0.043905 | 0.044607 | 0.000000 | 0.035995 | 0.072219 | 0.000000 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 587 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.037087 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.052179 | 0.000000 | 0.828584 | 0.035307 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.094569 | 0.000000 |
| 588 | 0.0 | 0.0 | 0.0 | 0.035962 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.039119 | 0.000000 | 1.642812 | 0.000000 | 0.000000 | 0.074946 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 589 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.141428 | 0.029187 | 0.287361 | 0.202327 | 0.059597 | 0.000000 | 0.000000 | 0.024091 | 0.094148 | 0.000000 |
| 590 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | ... | 0.031027 | 0.000000 | 0.155326 | 0.051923 | 0.000000 | 0.105978 | 0.000000 | 0.000000 | 0.032125 | 0.000000 |
| -1 | 0.0 | 0.0 | 0.0 | 0.003845 | 0.000000 | 0.0 | 0.009864 | 0.000000 | 0.0 | 0.0 | ... | 0.070795 | 0.030611 | 0.236199 | 0.057267 | 0.053324 | 0.301895 | 0.023147 | 0.022444 | 0.200147 | 0.008683 |
592 rows × 36755 columns
rna_seq_data.Ori_Data.ad_all = grouped
rna_seq_data.Ori_Data.super_gene_exp = grouped
super_gene_exp = rna_seq_data.Ori_Data.super_gene_exp
super_gene_mean = rna_seq_data.Ori_Data.super_gene_mean
super_gene_std = rna_seq_data.Ori_Data.super_gene_std
rna_seq_data.Ori_Data.super_gene_mean = rna_seq_data.Ori_Data.super_gene_exp.mean()
rna_seq_data.Ori_Data.super_gene_std = rna_seq_data.Ori_Data.super_gene_exp.std()
rna_seq_data.P_value_matrix
| ADNP | AFF1 | AFF4 | AGO1 | AHR | AIRE | ALX1 | ALX3 | ALX4 | ANHX | ... | ZSCAN22 | ZSCAN23 | ZSCAN29 | ZSCAN30 | ZSCAN31 | ZSCAN4 | ZSCAN5A | ZSCAN5C | ZXDB | ZXDC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| row | |||||||||||||||||||||
| -1 | 8.563566e-04 | 0.360750 | 0.559371 | 0.182508 | 0.018380 | 0.000006 | 0.000029 | 1.898788e-05 | 0.000016 | 0.000330 | ... | 0.305053 | 0.001561 | 0.114077 | 0.000631 | 1.361497e-03 | 3.585991e-06 | 0.006103 | 2.655798e-05 | 0.261396 | 1.215443e-01 |
| 0 | 3.446237e-12 | 0.497215 | 0.500529 | 0.163227 | 0.000100 | 0.000003 | 0.000129 | 1.646497e-07 | 0.000002 | 0.000044 | ... | 0.349755 | 0.002214 | 0.028432 | 0.000275 | 1.026750e-07 | 3.986286e-06 | 0.034384 | 1.379630e-05 | 0.219834 | 1.525143e-12 |
| 1 | 7.096883e-02 | 0.480607 | 0.591523 | 0.193952 | 0.028921 | 0.051201 | 0.000002 | 4.352112e-04 | 0.000222 | 0.000107 | ... | 0.309516 | 0.000656 | 0.068821 | 0.001493 | 5.858211e-06 | 8.907871e-07 | 0.008626 | 1.288200e-08 | 0.212142 | 2.165690e-01 |
| 10 | 5.333758e-02 | 0.544322 | 0.600351 | 0.262902 | 0.029763 | 0.057495 | 0.056367 | 7.002899e-04 | 0.000222 | 0.029509 | ... | 0.309205 | 0.004910 | 0.059540 | 0.001900 | 4.214787e-02 | 3.509326e-07 | 0.021020 | 1.045370e-04 | 0.176866 | 2.342755e-01 |
| 100 | 7.107864e-02 | 0.399857 | 0.764365 | 0.176755 | 0.121364 | 0.063864 | 0.032153 | 3.981877e-02 | 0.026263 | 0.010673 | ... | 0.298497 | 0.011864 | 0.116632 | 0.019865 | 7.758212e-02 | 9.608021e-03 | 0.008497 | 1.111107e-04 | 0.092073 | 1.953459e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 1.291925e-01 | 0.487603 | 0.791613 | 0.203145 | 0.080655 | 0.063231 | 0.036989 | 3.843538e-02 | 0.045159 | 0.022616 | ... | 0.353437 | 0.004247 | 0.114041 | 0.020601 | 2.575347e-02 | 5.994760e-05 | 0.023438 | 1.904101e-02 | 0.140363 | 9.592408e-02 |
| 96 | 1.031648e-01 | 0.318737 | 0.647392 | 0.119693 | 0.108568 | 0.019088 | 0.034342 | 2.822314e-02 | 0.049918 | 0.056430 | ... | 0.194852 | 0.011646 | 0.117273 | 0.023761 | 2.218386e-02 | 3.685638e-03 | 0.013456 | 6.609630e-03 | 0.204090 | 1.425852e-01 |
| 97 | 1.062549e-01 | 0.340830 | 0.789539 | 0.114368 | 0.136252 | 0.097897 | 0.059248 | 8.784248e-02 | 0.156568 | 0.042775 | ... | 0.191674 | 0.002716 | 0.095484 | 0.016832 | 5.122700e-02 | 8.143724e-02 | 0.000837 | 2.209640e-02 | 0.089183 | 1.023146e-01 |
| 98 | 6.693196e-02 | 0.558301 | 0.756792 | 0.281248 | 0.047730 | 0.104862 | 0.089017 | 1.252955e-01 | 0.083650 | 0.035471 | ... | 0.302286 | 0.002517 | 0.147804 | 0.007596 | 2.793163e-02 | 4.735611e-03 | 0.019666 | 2.356419e-02 | 0.289258 | 1.428439e-01 |
| 99 | 6.862545e-02 | 0.496669 | 0.600869 | 0.187158 | 0.096891 | 0.030484 | 0.006576 | 2.917082e-03 | 0.001573 | 0.015501 | ... | 0.390762 | 0.018814 | 0.083292 | 0.019330 | 3.054598e-02 | 4.523016e-06 | 0.027676 | 5.116583e-04 | 0.212204 | 2.528735e-01 |
592 rows × 1226 columns
rna_seq_data.tf_score
| ADNP | AFF1 | AFF4 | AGO1 | AHR | AIRE | ALX1 | ALX3 | ALX4 | ANHX | ... | ZSCAN22 | ZSCAN23 | ZSCAN29 | ZSCAN30 | ZSCAN31 | ZSCAN4 | ZSCAN5A | ZSCAN5C | ZXDB | ZXDC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| row | |||||||||||||||||||||
| -1 | 3.672389e-04 | 0.178091 | 0.239622 | 0.070631 | 0.007116 | 1.357711e-06 | 0.0 | 2.845156e-06 | 2.377675e-06 | 0.0 | ... | 0.063907 | 0.000153 | 0.028755 | 0.000139 | 3.560237e-04 | 0.0 | 0.001316 | 0.0 | 0.052378 | 4.111837e-02 |
| 0 | 2.414882e-12 | 0.261679 | 0.205148 | 0.078429 | 0.000027 | 5.387451e-07 | 0.0 | 2.461573e-08 | 2.656785e-07 | 0.0 | ... | 0.126825 | 0.000271 | 0.014292 | 0.000077 | 1.620604e-08 | 0.0 | 0.003788 | 0.0 | 0.063796 | 1.009363e-12 |
| 1 | 5.021123e-02 | 0.086393 | 0.228981 | 0.160712 | 0.007152 | 8.023564e-03 | 0.0 | 7.339054e-05 | 3.607554e-05 | 0.0 | ... | 0.040119 | 0.000085 | 0.005075 | 0.000094 | 9.613452e-07 | 0.0 | 0.000946 | 0.0 | 0.085753 | 6.809414e-02 |
| 10 | 3.808361e-02 | 0.175952 | 0.306938 | 0.168117 | 0.009945 | 1.062668e-02 | 0.0 | 1.275641e-04 | 4.347075e-05 | 0.0 | ... | 0.037240 | 0.004910 | 0.033682 | 0.000281 | 6.428773e-03 | 0.0 | 0.002387 | 0.0 | 0.081476 | 8.746815e-02 |
| 100 | 3.513059e-02 | 0.293045 | 0.449179 | 0.129605 | 0.061245 | 8.614222e-03 | 0.0 | 5.562580e-03 | 3.426428e-03 | 0.0 | ... | 0.019991 | 0.001214 | 0.053916 | 0.005434 | 9.358260e-03 | 0.0 | 0.000642 | 0.0 | 0.008968 | 1.053210e-01 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 1.177092e-01 | 0.209708 | 0.538176 | 0.134788 | 0.020214 | 1.010271e-02 | 0.0 | 6.525869e-03 | 7.493207e-03 | 0.0 | ... | 0.028318 | 0.000345 | 0.100466 | 0.007886 | 3.624425e-03 | 0.0 | 0.001972 | 0.0 | 0.081238 | 4.317646e-02 |
| 96 | 2.874359e-02 | 0.148875 | 0.243910 | 0.046832 | 0.029400 | 2.040194e-03 | 0.0 | 2.741001e-03 | 4.072542e-03 | 0.0 | ... | 0.058285 | 0.000712 | 0.037024 | 0.010316 | 2.618379e-03 | 0.0 | 0.000991 | 0.0 | 0.020238 | 5.766247e-02 |
| 97 | 1.906883e-02 | 0.174282 | 0.399717 | 0.026488 | 0.035957 | 1.174490e-02 | 0.0 | 1.196944e-02 | 1.987612e-02 | 0.0 | ... | 0.010206 | 0.000062 | 0.016045 | 0.006168 | 6.485161e-03 | 0.0 | 0.000341 | 0.0 | 0.004761 | 2.665132e-03 |
| 98 | 1.754907e-02 | 0.471482 | 0.416241 | 0.115289 | 0.022126 | 1.411043e-02 | 0.0 | 1.252955e-01 | 1.405428e-02 | 0.0 | ... | 0.021562 | 0.000198 | 0.069066 | 0.005385 | 4.197360e-03 | 0.0 | 0.010347 | 0.0 | 0.024765 | 8.091507e-02 |
| 99 | 2.922981e-02 | 0.146265 | 0.232052 | 0.074721 | 0.056472 | 3.580166e-03 | 0.0 | 4.096966e-04 | 2.107131e-04 | 0.0 | ... | 0.029510 | 0.001427 | 0.026408 | 0.005165 | 3.396306e-03 | 0.0 | 0.001820 | 0.0 | 0.108117 | 1.003711e-01 |
592 rows × 1226 columns
Calculate the AUPRC and AUROC¶
scripro_score = test_data.adata.obs.merge(rna_seq_data.tf_score,left_on='new_leiden',right_index=True).iloc[:,13:]
commontf = set(test_data.adata.obs['gene']).intersection(set(scripro_score.columns))
scripro_auroc_dic = {}
for k in commontf:
y_true = []
for i in scripro_score.index:
if test_data.adata.obs.loc[i,'gene'] == k:
y_true.append(1)
else:
y_true.append(0)
y_scores = list(scripro_score.loc[:,k])
fpr, tpr, thresholds = roc_curve(y_true, y_scores)
roc_auc = auc(fpr, tpr)
scripro_auroc_dic[k]=roc_auc
scripro_auroc_score = pd.DataFrame([scripro_auroc_dic]).T.sort_values(ascending = False,by = 0)
scripro_auroc_score.columns = ['auroc']
scripro_auroc_score
| auroc | |
|---|---|
| EOMES | 0.993938 |
| GATA3 | 0.951226 |
| RELA | 0.941175 |
| FOXD2 | 0.916245 |
| PRDM1 | 0.915120 |
| TBX21 | 0.863753 |
| LHX4 | 0.818389 |
| FOXQ1 | 0.743893 |
| LHX6 | 0.735591 |
| WT1 | 0.729839 |
| JMJD1C | 0.691111 |
| ALX4 | 0.662308 |
| NOTCH1 | 0.583613 |
| IKZF3 | 0.564741 |
| FOSB | 0.413174 |
import pandas as pd
from sklearn.metrics import precision_recall_curve, auc
scripro_auprc_dic = {}
for k in commontf:
y_true = []
for i in scripro_score.index:
if test_data.adata.obs.loc[i, 'gene'] == k:
y_true.append(1)
else:
y_true.append(0)
y_scores = list(scripro_score.loc[:,k])
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)
auprc = auc(recall, precision)
scripro_auprc_dic[k] = auprc
scripro_auprc_score = pd.DataFrame([scripro_auprc_dic]).T.sort_values(ascending=False, by=0)
scripro_auprc_score.columns = ['auprc']
scripro_auprc_score
| auprc | |
|---|---|
| EOMES | 0.871598 |
| RELA | 0.760414 |
| GATA3 | 0.731483 |
| WT1 | 0.598684 |
| LHX6 | 0.495087 |
| FOXD2 | 0.280925 |
| TBX21 | 0.097302 |
| ALX4 | 0.061300 |
| FOXQ1 | 0.003539 |
| JMJD1C | 0.000962 |
| PRDM1 | 0.000350 |
| FOSB | 0.000324 |
| NOTCH1 | 0.000202 |
| LHX4 | 0.000163 |
| IKZF3 | 0.000128 |