Prerank Algorithm
Purpose and Usage
Scenario 1: Perform prerank analysis on all differentially expressed genes obtained from SDAS DEG analysis
SDAS geneSetEnrichment prerank \ -i de_t-test.anno_rctd.SmoothMuscle-vs-Endo.all.csv -o ./ \ --species human
Scenario 2: Analyze only with databases of interest
SDAS geneSetEnrichment prerank \ -i de_t-test.anno_rctd.SmoothMuscle-vs-Endo.all.csv -o ./ \ --gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt
Scenario 3: Plot only pathways of interest. Write the full names of the pathways of interest into a txt file, one per line, and pass this txt file to the analysis process via the
--pathways
parameter. Note that the specified pathways must be included in the database used.SDAS geneSetEnrichment prerank \ -i de_t-test.anno_rctd.SmoothMuscle-vs-Endo.all.csv -o ./ \ --gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt \ --pathways ./term.txt
Input Parameter Description
-i / --input
Yes
deg result file in csv format.
-o / --output
Yes
The GSEApy output directory. Default: the current working directory
--species
No
human
Use biuld-in gmt database: human or mouse. Default: human. More database see here: https://amp.pharm.mssm.edu/modEnrichr.
--gmt
No
Customized gene set database in GMT format. One or more databases split by ",". Default use --species build-in database.
--graph
No
5
Numbers of top graphs produced. Default: 5
--pathways
No
Specify graphs name in a txt file to draw GSEA picture, default uses --graph .
--min_size
No
Min size of input genes presented in Gene Sets. Default: 15
--max_size
No
Max size of input genes presented in Gene Sets. Default: 20000
--label
No
The phenotype label argument need two parameters to define. Default: ('Pos','Neg')
-v / --verbose
No
Increase output verbosity, print out progress of your job. Default False
--permu_num
No
1000
Number of random permutations. For calculating esnulls. Default: 1000
--weight
No
1
Weighted_score of rank_metrics. For weighting input genes. Choose from {0, 1, 1.5, 2}. Default: 1
--ascending
No
Rank metric sorting order. If the --ascending flag was chosen, then ascending equals to True. Default: False.
--seed
No
123
Number of random seed. Default: 123
--threads
No
1
Number of threads you are going to use. Default: 1
Output Results Display
prerank_{database}.csv
Result file in csv format
prerank_{database}:top10.pdf/png
Top 10 pathway plots in pdf and png formats
csv file format:
prerank_{database}.csv
is similar to the gsea result, containing Name, Term, ES, NES, NOM p-val, FDR q-val, FWER p-val, Tag %, Gene %, Lead_genes. Term is the pathway name; ES is the Enrichment Score, reflecting the degree of enrichment of gene set members in the ranked gene list (e.g., ranked by log2FC). Positive ES: gene set is enriched at the top of the list (positively correlated with phenotype); negative ES: enriched at the bottom (negatively correlated). NES is the Normalized Enrichment Score; NOM p-val is the nominal p-value; FDR q-val is the adjusted p-value; FWER p-val is the family-wise error rate adjusted p-value; Tag % is the percentage of genes in the core enrichment region; Gene % is the percentage of genes used in the analysis out of the total in the gene set; Lead_genes are the core genes contributing most to the ES.
prerank
HALLMARK_MYC_TARGETS_V1
0.7472938191195556
2.39333105644001
0.0
0.0
0.0
160/195
18.89%
RPL14;HNRNPA2B1;...
prerank
HALLMARK_OXIDATIVE_PHOSPHORYLATION
0.7431758291176868
2.376055485647371
0.0
0.0
0.0
168/200
20.44%
MDH2;COX8A;...
prerank
HALLMARK_ALLOGRAFT_REJECTION
0.744882727767552
2.3688992213810462
0.0
0.0
0.0
118/194
14.03%
ITGB2;HLA-DRA;...
prerank
...
...
...
...
...
...
...
...
...
Top Terms Enrichment Curve Plot:
prerank_{database}:top10.pdf/png
(see example below). In the plot, the sign of the Enrichment Score (ES) directly reflects the distribution pattern of the gene set in the gene list ranked by log2FC: positive ES means the gene set is concentrated at the top of the list and positively correlated with the phenotype; negative ES means the gene set is concentrated at the bottom and negatively correlated.

Last updated