Usage¶
SCRIPro is dedicated for single-cell or spatial RNA-seq datasets and multiome datasets (chromatin accessibility and transcriptome) to explore the gene regulation relationships. The package includes five main functions.
usage: scripro [-h] [--version] {enrich_rna,enrich_multiome,get_tf_target,enrich_atac,install_reference} ...
scripro
positional arguments:
{enrich_rna,enrich_multiome,get_tf_target,enrich_atac,install_reference}
enrich_rna Calculate TF activation use scRNA-seq data
enrich_multiome Calculate TF activation use scRNA-seq data and scATAC-seq data
get_tf_target Calculate TF and target gene score for scRNA-seq data and sc multiome data
enrich_atac Invoke SCRIP
install_reference Install RP reference
optional arguments:
-h, --help Show this help message and exit
--version Show program's version number and exit
For command line options of each command, type: scripro COMMAND -h
Detailed usages are listed as follows:
scripro enrich_rna¶
For enrichment of TF activity for single-cell or spatial RNA-seq data, you can use scripro enrich_rna.
- In this function, you can input the feature count matrix in H5 or MTX format.
- The -n parameter controls how many cells merge into a metacell. This parameter affects the resolution of the results.
- This function will output a pkl including the pvalue matrix and tf activity score matrix.
scripro enrich_rna [-h] -i FEATURE_MATRIX -n CELL_NUM -s {hg38,mm10} -p PROJECT [-t N_CORES]
optional arguments:
-h, --help Show this help message and exit
Input files arguments:
-i FEATURE_MATRIX, --input_feature_matrix FEATURE_MATRIX
scRNA-seq data matrix . REQUIRED.
-n CELL_NUM, --cell_number CELL_NUM
Metacell Cell Number . REQUIRED.
-s {hg38,mm10}, --species {hg38,mm10}
Species. "hg38"(human) or "mm10"(mouse). REQUIRED.
Output arguments:
-p PROJECT, --project PROJECT
Project name, which will be used to generate output files.
Other options:
-t N_CORES, --thread N_CORES
Number of cores use to run SCRIPro. DEFAULT: 8.
scripro enrich_multiome¶
For enrichment of TF activity for both RNA-seq and ATAC-seq for single-cell or spatial data, you can use scripro enrich_multiome.
In this function, you are allowed to input a transcriptome dataset and a chromatin accessibility dataset.
For transcriptome data, a feature count matrix is required. For chromatin accessibility, you can input a fragment file or feature count matrix either.
The
-nparameter controls how many cells merge into a metacell. This parameter affects the resolution of the results.For barcode matched multiome dataset, like SHARE-seq or 10X multiome dataset, the
-bshould be set to0. Otherwise, this should be set as1.If
-bis set as1, a GTF annotation file need to provide.This function will output a pkl including the pvalue matrix and tf activity score matrix and a dir named ‘bigwig’ will also be generated, containing the each metacell’s corresponding chromatin landscape file
usage: scripro enrich_multiome [-h] -i FEATURE_MATRIX -n CELL_NUM -s {hg38,mm10} -a {fragment,matrix} -b {0,1} -f ATAC_PATH [-g GLUE_ANNOTATION] -p PROJECT [-t N_CORES]
optional arguments:
-h, --help Show this help message and exit
Input files arguments:
-i FEATURE_MATRIX, --input_feature_matrix FEATURE_MATRIX
A cell by peak matrix . REQUIRED.
-n CELL_NUM, --cell_number CELL_NUM
Metacell Cell Number . REQUIRED.
-s {hg38,mm10}, --species {hg38,mm10}
Species. "hg38"(human) or "mm10"(mouse). REQUIRED.
-a {fragment,matrix}, --atac_file_type {fragment,matrix}
atac_file_type,"fragment" or "matrix(h5,h5ad,mtx)"
-b {0,1}, --barcode_corresponds {0,1}
Whether the scRNA-seq barcode matches the scATAC-seq barcode. "0"(Match) or "1"(Not match). REQUIRED.
-f ATAC_PATH, --atac_file ATAC_PATH
ATAC file path.REQUIRED.
-g GLUE_ANNOTATION, --glue_annotation GLUE_ANNOTATION
If the scRNA-seq barcodes do not match the scATAC-seq barcodes, the glue_annotation file that will be used.,like 'gencode.v43.chr_patch_hapl_scaff.annotation.gtf.gz'
Output arguments:
-p PROJECT, --project PROJECT
Project name, which will be used to generate output files.
Other options:
-t N_CORES, --thread N_CORES
Number of cores use to run SCRIPros. DEFAULT: 8.
scripro enrich_atac¶
For enrichment of TF activity for single-cell ATAC-seq data, you can use scripro enrich_atac command:
scripro enrich_atac act same as SCRIP, the function includes enrich, impute, target, config, and index.
The reference files for SCRIP are different from SCRIPro, which you can download from zenodo and config with SCRIP config.
In this function, you can input a peak count matrix in H5 or MTX format, with basic parameters of quality control.
This function will output a folder including these files:
beds: bed files of all cells
ChIP_result: txt files of Giggle search results
peaks_length.txt: peak total length of each cell
SCRIP_enrichment.txt: the result of the SCRIP score
dataset_overlap_df.pk: the raw number of overlaps of each cell to each dataset
dataset_cell_norm_df.pk: normalized scores
dataset_score_source_df.pk: matched reference datasets
tf_cell_score_df.pk: the same table to SCRIP_enrichment.txt but untransposed and in pickle format
detail usage see SCRIP documentation
Using example:
scripro enrich_atac enrich [-h] -i FEATURE_MATRIX -s {hs,mm} [-p PROJECT] [--min_cells MIN_CELLS] [--min_peaks MIN_PEAKS] [--max_peaks MAX_PEAKS] [-t N_CORES] [-m {max,mean}] [-y] [--clean]
optional arguments:
-h, --help Show this help message and exit
Input files arguments:
-i FEATURE_MATRIX, --input_feature_matrix FEATURE_MATRIX
A cell by peak matrix . REQUIRED.
-s {hs,mm}, --species {hs,mm}
Species. "hs"(human) or "mm"(mouse). REQUIRED.
Output arguments:
-p PROJECT, --project PROJECT
Project name, which will be used to generate output files folder. DEFAULT: Random generate.
Preprocessing paramater arguments:
--min_cells MIN_CELLS
Minimal cell cutoff for features. Auto will take 0.05% of total cell number.DEFAULT: "auto".
--min_peaks MIN_PEAKS
Minimal peak cutoff for cells. Auto will take the mean-3*std of all feature number (if less than 500 is 500). DEFAULT: "auto".
--max_peaks MAX_PEAKS
Max peak cutoff for cells. This will help you to remove the doublet cells. Auto will take the mean+5*std of all feature
number. DEFAULT: "auto".
Other options:
-t N_CORES, --thread N_CORES
Number of cores use to run SCRIP. DEFAULT: 16.
-m {max,mean}, --mode {max,mean}
Deduplicate strategy. DEFAULT: max.
scripro get_tf_target¶
For getting the target of specific TR, you can use scripro get_tf_target.
In this function, you can input the results of
enrich_rnaorenrich_multiomeand a TF name and will output the target genes of the TF.This function will output a csv containing the regulatory activity of tf downstream target genes within each metacell is generated.
scripro get_tf_target [-h] -i SCRIPRO_RESULT -t TF_NAME -p PROJECT
optional arguments:
-h, --help Show this help message and exit
Input files arguments:
-i SCRIPRO_RESULT, --input_scripro_result SCRIPRO_RESULT
SCRIPro result pickle file. REQUIRED.
-t TF_NAME, --tf_name TF_NAME
TF name to calculate the target . REQUIRED.
Output arguments:
-p PROJECT, --project PROJECT
Project name, which will be used to generate output file.
-y, --yes Whether ask for confirmation. DEFAULT: False.
--clean Whether delete tmp files(including bed and search results) generated by SCRIP. DEFAULT: False.