Trajectory & ATAC Only Workflow

To test the ability of SCRIPro to apply to continuously differentiated datasets, we applied it to SHARE-seq sequenced mouse hair follicle development data (which can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140203), where transcriptome and epigenome data were used to infer transcription factor enrichment scores, respectively, and examine the spatio-temporal heterogeneity of transcription factor function.

Using Shell:

Transcription factor enrichment scores can be obtained by SCRIPro using the following shell statement:

scripro enrich_multiome -i ./data/rna/trajectory.h5ad -n 50 -s mm10 -a matrix -b 1 -f ./data/atac/trajectory.h5ad -g ./gencode.vM25.annotation.gtf.gz -p  Trajectory -t 12

Transcription factor enrichment scores can be obtained by SCRIP using the following shell statement:

scripro enrich_atac enrich -i ./data/atac/trajectory.h5ad -s mm -p Trajectory  -t 12

Using Python for custom analysis:

import scripro
import scglue
import anndata
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")

Load data

Load the scRNA-seq data and scATAC-seq data, as well as the SCRIP calculated TF score

miradata = anndata.read_h5ad('./data/rna/trajectory.h5ad')
scrip = pd.read_csv('./enrichment/SCRIP_enrichment.txt',sep='\t',index_col=0)
atac = sc.read_h5ad('./data/atac/trajectory.h5ad')
rna = sc.read_h5ad("./data/rna/trajectory.h5ad")
select_cell = list(set(miradata.to_df().index).intersection(scrip.index).intersection(atac.to_df().index).intersection(rna.to_df().index))
miradata = miradata[select_cell]
atac =  atac[select_cell]
atac_df=atac.to_df()
rna =rna[select_cell]
rna.var_names_make_unique()
rna.var_names = [x.capitalize() for x in rna.var_names]

Calculate metacell and markergene

test_data = scripro.Ori_Data(rna,Cell_num=50)
test_data.get_positive_marker_gene_parallel(cores=4)
cellgroup = test_data.adata.obs.loc[:,['new_leiden']]
sc.get.rank_genes_groups_df(test_data.adata, group='0_0', log2fc_min=0.5, pval_cutoff=0.1)
names scores logfoldchanges pvals pvals_adj
0 Robo1 33.262318 3.100753 2.210608e-70 8.281822e-67
1 Sox5 31.031414 4.105570 9.752101e-62 2.283454e-58
2 Cux1 30.107073 2.690581 3.416571e-65 9.142745e-62
3 Eda 21.409723 3.492702 2.632037e-43 2.900195e-40
4 Nfib 18.497385 2.344622 3.836717e-38 2.994558e-35
... ... ... ... ... ...
1902 Ddx27 2.153615 0.978330 3.330264e-02 9.880031e-02
1903 Tmc7 2.153065 2.035018 3.337731e-02 9.899046e-02
1904 9930021j03rik 2.152729 0.765872 3.336331e-02 9.896460e-02
1905 Psmb1 2.151586 0.917678 3.346546e-02 9.920478e-02
1906 Mta1 2.149554 1.004966 3.362917e-02 9.961127e-02

1907 rows × 5 columns

test_data.adata.obs
n_genes celltype true_cell leiden new_leiden
barcode
R1.04.R2.48.R3.50.P1.55 672 Medulla Medulla 4 4_1
R1.36.R2.51.R3.11.P1.55 582 TAC-1 Cortex 1 1_0
R1.56.R2.29.R3.61.P1.53 838 Mix Matrix 0 0_17
R1.03.R2.55.R3.02.P1.54 609 TAC-1 Cortex 1 1_2
R1.72.R2.16.R3.44.P1.55 751 Hair Shaft-cuticle.cortex Cortex 1 1_7
... ... ... ... ... ...
R1.59.R2.20.R3.84.P1.56 959 IRS IRS 2 2_9
R1.45.R2.04.R3.35.P1.55 535 TAC-2 Inner Matrix 6 6_0
R1.02.R2.75.R3.29.P1.55 451 TAC-1 Medulla 1 1_6
R1.72.R2.70.R3.69.P1.55 623 TAC-1 Matrix 0 0_9
R1.59.R2.21.R3.81.P1.56 1554 Hair Shaft-cuticle.cortex Cortex 1 1_5

6243 rows × 5 columns

Calculate the landscape of metacell

scripro.dataframe_to_sparse_tsv(atac_df, 'test.tsv')
scripro.get_metacell_fragment(cellgroup,'.','./test.tsv',chunksize = 10000000)
scripro.process_tsv('./metacell_fragment/', 'mm10')
share_seq_data = scripro.SCRIPro_Multiome(8,'mm10',test_data)

Calculate the TF activity score

share_seq_data.cal_ISD_parallel('./bigwig/')
share_seq_data.get_tf_score()
sns.clustermap(share_seq_data.tf_score)
../_images/Trajectory_workflow_27_1.png

Calculate the TF activity score corresponding to pesudotime

trajectory_data = sc.read_h5ad('/fs/home/xuyunfan/project/SCRIPro/package/trajectory.h5ad')
all_pro_score = pd.merge(test_data.adata.obs,share_seq_data.tf_score,left_on='new_leiden',right_index=True)
all_pro_score=all_pro_score.iloc[:,5:]
trajectory_data = trajectory_data[select_cell2]
all_pro_score =all_pro_score.loc[trajectory_data.obs.index,:]
all_anndata= sc.AnnData(all_pro_score)
all_anndata.obsm = trajectory_data.obsm
sc.pl.umap(all_anndata,color = 'Prdm1')
../_images/Trajectory_workflow_36_0.png

Calculate the difference between the SCRIPro and SCRIP scores corresponding to ORS-Medulla

select_cell2 = list(set(select_cell).intersection(all_pro_score.index))
scrip = (scrip - scrip.min())/(scrip.max()-scrip.min())
scrip = scrip.loc[select_cell2,:]
tra = trajectory_data.obs
Medulla_tra = tra[tra['Medulla_prob'] >0.25].sort_values(by = 'Medulla_prob').index
Medulla_score =all_pro_score.loc[Medulla_tra,:]
Medulla_score.index = Medulla_score['true_cell']
Medulla_score =Medulla_score.iloc[:,5:]
Medulla_score = (Medulla_score - Medulla_score.min())/(Medulla_score.max() - Medulla_score.min())
sns.clustermap(Medulla_score.loc[:,Medulla_score.std().sort_values(ascending = False)[0:100].index].rolling(window=100).mean().iloc[100:,:],row_cluster=False)
../_images/Trajectory_workflow_47_1.png
plt.figure(figsize=(10, 5))
TF = 'Hoxc13'
data_series1=pd.Series(list(Medulla_score.loc[:,TF]))
smooth_data1 = data_series1.rolling(window=200).mean()
smooth_data1 = (smooth_data1 - smooth_data1.min())/(smooth_data1.max() - smooth_data1.min())
plt.plot(smooth_data1[200:].reset_index(drop = True), label='SCRIPro')

data_series2=pd.Series(list(scrip.loc[Medulla_tra,TF]))
smooth_data2 = data_series2.rolling(window=200).mean()
smooth_data2 = (smooth_data2 - smooth_data2.min())/(smooth_data2.max() - smooth_data2.min())
plt.plot(smooth_data2[200:].reset_index(drop = True), label='SCRIP')

# set xticks every 200 steps
xticks_locs = np.arange(0, len(Medulla_score.index), 400)
plt.xticks(xticks_locs, Medulla_score.index[xticks_locs])
plt.title(TF)
plt.legend(loc='right')


plt.show()
../_images/Trajectory_workflow_48_0.png