Trajectory & ATAC Only Workflow¶

To test the ability of SCRIPro to apply to continuously differentiated datasets, we applied it to SHARE-seq sequenced mouse hair follicle development data (which can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE140203), where transcriptome and epigenome data were used to infer transcription factor enrichment scores, respectively, and examine the spatio-temporal heterogeneity of transcription factor function.

Using Shell:¶

Transcription factor enrichment scores can be obtained by SCRIPro using the following shell statement:

scripro enrich_multiome -i ./data/rna/trajectory.h5ad -n 50 -s mm10 -a matrix -b 1 -f ./data/atac/trajectory.h5ad -g ./gencode.vM25.annotation.gtf.gz -p  Trajectory -t 12

Transcription factor enrichment scores can be obtained by SCRIP using the following shell statement:

scripro enrich_atac enrich -i ./data/atac/trajectory.h5ad -s mm -p Trajectory  -t 12

Using Python for custom analysis:¶

import scripro
import scglue
import anndata
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")

Load data¶

Load the scRNA-seq data and scATAC-seq data, as well as the SCRIP calculated TF score

miradata = anndata.read_h5ad('./data/rna/trajectory.h5ad')
scrip = pd.read_csv('./enrichment/SCRIP_enrichment.txt',sep='\t',index_col=0)
atac = sc.read_h5ad('./data/atac/trajectory.h5ad')
rna = sc.read_h5ad("./data/rna/trajectory.h5ad")
select_cell = list(set(miradata.to_df().index).intersection(scrip.index).intersection(atac.to_df().index).intersection(rna.to_df().index))
miradata = miradata[select_cell]
atac =  atac[select_cell]
atac_df=atac.to_df()
rna =rna[select_cell]
rna.var_names_make_unique()
rna.var_names = [x.capitalize() for x in rna.var_names]

Calculate metacell and markergene¶

test_data = scripro.Ori_Data(rna,Cell_num=50)
test_data.get_positive_marker_gene_parallel(cores=4)
cellgroup = test_data.adata.obs.loc[:,['new_leiden']]
sc.get.rank_genes_groups_df(test_data.adata, group='0_0', log2fc_min=0.5, pval_cutoff=0.1)

	names	scores	logfoldchanges	pvals	pvals_adj
0	Robo1	33.262318	3.100753	2.210608e-70	8.281822e-67
1	Sox5	31.031414	4.105570	9.752101e-62	2.283454e-58
2	Cux1	30.107073	2.690581	3.416571e-65	9.142745e-62
3	Eda	21.409723	3.492702	2.632037e-43	2.900195e-40
4	Nfib	18.497385	2.344622	3.836717e-38	2.994558e-35
...	...	...	...	...	...
1902	Ddx27	2.153615	0.978330	3.330264e-02	9.880031e-02
1903	Tmc7	2.153065	2.035018	3.337731e-02	9.899046e-02
1904	9930021j03rik	2.152729	0.765872	3.336331e-02	9.896460e-02
1905	Psmb1	2.151586	0.917678	3.346546e-02	9.920478e-02
1906	Mta1	2.149554	1.004966	3.362917e-02	9.961127e-02

1907 rows × 5 columns

test_data.adata.obs

	n_genes	celltype	true_cell	leiden	new_leiden
barcode
R1.04.R2.48.R3.50.P1.55	672	Medulla	Medulla	4	4_1
R1.36.R2.51.R3.11.P1.55	582	TAC-1	Cortex	1	1_0
R1.56.R2.29.R3.61.P1.53	838	Mix	Matrix	0	0_17
R1.03.R2.55.R3.02.P1.54	609	TAC-1	Cortex	1	1_2
R1.72.R2.16.R3.44.P1.55	751	Hair Shaft-cuticle.cortex	Cortex	1	1_7
...	...	...	...	...	...
R1.59.R2.20.R3.84.P1.56	959	IRS	IRS	2	2_9
R1.45.R2.04.R3.35.P1.55	535	TAC-2	Inner Matrix	6	6_0
R1.02.R2.75.R3.29.P1.55	451	TAC-1	Medulla	1	1_6
R1.72.R2.70.R3.69.P1.55	623	TAC-1	Matrix	0	0_9
R1.59.R2.21.R3.81.P1.56	1554	Hair Shaft-cuticle.cortex	Cortex	1	1_5

6243 rows × 5 columns

Calculate the landscape of metacell¶

scripro.dataframe_to_sparse_tsv(atac_df, 'test.tsv')
scripro.get_metacell_fragment(cellgroup,'.','./test.tsv',chunksize = 10000000)
scripro.process_tsv('./metacell_fragment/', 'mm10')
share_seq_data = scripro.SCRIPro_Multiome(8,'mm10',test_data)

Calculate the TF activity score¶

share_seq_data.cal_ISD_parallel('./bigwig/')
share_seq_data.get_tf_score()
sns.clustermap(share_seq_data.tf_score)

Calculate the TF activity score corresponding to pesudotime¶

trajectory_data = sc.read_h5ad('/fs/home/xuyunfan/project/SCRIPro/package/trajectory.h5ad')
all_pro_score = pd.merge(test_data.adata.obs,share_seq_data.tf_score,left_on='new_leiden',right_index=True)
all_pro_score=all_pro_score.iloc[:,5:]
trajectory_data = trajectory_data[select_cell2]
all_pro_score =all_pro_score.loc[trajectory_data.obs.index,:]
all_anndata= sc.AnnData(all_pro_score)
all_anndata.obsm = trajectory_data.obsm
sc.pl.umap(all_anndata,color = 'Prdm1')

Calculate the difference between the SCRIPro and SCRIP scores corresponding to ORS-Medulla¶

select_cell2 = list(set(select_cell).intersection(all_pro_score.index))
scrip = (scrip - scrip.min())/(scrip.max()-scrip.min())
scrip = scrip.loc[select_cell2,:]
tra = trajectory_data.obs
Medulla_tra = tra[tra['Medulla_prob'] >0.25].sort_values(by = 'Medulla_prob').index
Medulla_score =all_pro_score.loc[Medulla_tra,:]
Medulla_score.index = Medulla_score['true_cell']
Medulla_score =Medulla_score.iloc[:,5:]
Medulla_score = (Medulla_score - Medulla_score.min())/(Medulla_score.max() - Medulla_score.min())
sns.clustermap(Medulla_score.loc[:,Medulla_score.std().sort_values(ascending = False)[0:100].index].rolling(window=100).mean().iloc[100:,:],row_cluster=False)

plt.figure(figsize=(10, 5))
TF = 'Hoxc13'
data_series1=pd.Series(list(Medulla_score.loc[:,TF]))
smooth_data1 = data_series1.rolling(window=200).mean()
smooth_data1 = (smooth_data1 - smooth_data1.min())/(smooth_data1.max() - smooth_data1.min())
plt.plot(smooth_data1[200:].reset_index(drop = True), label='SCRIPro')

data_series2=pd.Series(list(scrip.loc[Medulla_tra,TF]))
smooth_data2 = data_series2.rolling(window=200).mean()
smooth_data2 = (smooth_data2 - smooth_data2.min())/(smooth_data2.max() - smooth_data2.min())
plt.plot(smooth_data2[200:].reset_index(drop = True), label='SCRIP')

# set xticks every 200 steps
xticks_locs = np.arange(0, len(Medulla_score.index), 400)
plt.xticks(xticks_locs, Medulla_score.index[xticks_locs])
plt.title(TF)
plt.legend(loc='right')


plt.show()