Prerank Algorithm

Purpose and Usage

  • Scenario 1: Perform prerank analysis on all differentially expressed genes obtained from SDAS DEG analysis

    SDAS geneSetEnrichment prerank \
    -i de_t-test.anno_rctd.SmoothMuscle-vs-Endo.all.csv -o ./ \
    --species human
  • Scenario 2: Analyze only with databases of interest

    SDAS geneSetEnrichment prerank \
    -i de_t-test.anno_rctd.SmoothMuscle-vs-Endo.all.csv -o ./ \
    --gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt
  • Scenario 3: Plot only pathways of interest. Write the full names of the pathways of interest into a txt file, one per line, and pass this txt file to the analysis process via the --pathways parameter. Note that the specified pathways must be included in the database used.

    SDAS geneSetEnrichment prerank \
    -i de_t-test.anno_rctd.SmoothMuscle-vs-Endo.all.csv -o ./ \
    --gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt \
    --pathways ./term.txt

Input Parameter Description

Prerank Parameter
Required
Default Value
Description

-i / --input

Yes

deg result file in csv format.

-o / --output

Yes

The GSEApy output directory. Default: the current working directory

--species

No

human

Use biuld-in gmt database: human or mouse. Default: human. More database see here: https://amp.pharm.mssm.edu/modEnrichr.

--gmt

No

Customized gene set database in GMT format. One or more databases split by ",". Default use --species build-in database.

--graph

No

5

Numbers of top graphs produced. Default: 5

--pathways

No

Specify graphs name in a txt file to draw GSEA picture, default uses --graph .

--min_size

No

Min size of input genes presented in Gene Sets. Default: 15

--max_size

No

Max size of input genes presented in Gene Sets. Default: 20000

--label

No

The phenotype label argument need two parameters to define. Default: ('Pos','Neg')

-v / --verbose

No

Increase output verbosity, print out progress of your job. Default False

--permu_num

No

1000

Number of random permutations. For calculating esnulls. Default: 1000

--weight

No

1

Weighted_score of rank_metrics. For weighting input genes. Choose from {0, 1, 1.5, 2}. Default: 1

--ascending

No

Rank metric sorting order. If the --ascending flag was chosen, then ascending equals to True. Default: False.

--seed

No

123

Number of random seed. Default: 123

--threads

No

1

Number of threads you are going to use. Default: 1

Output Results Display

Prerank Result File
Description

prerank_{database}.csv

Result file in csv format

prerank_{database}:top10.pdf/png

Top 10 pathway plots in pdf and png formats

  • csv file format: prerank_{database}.csv is similar to the gsea result, containing Name, Term, ES, NES, NOM p-val, FDR q-val, FWER p-val, Tag %, Gene %, Lead_genes. Term is the pathway name; ES is the Enrichment Score, reflecting the degree of enrichment of gene set members in the ranked gene list (e.g., ranked by log2FC). Positive ES: gene set is enriched at the top of the list (positively correlated with phenotype); negative ES: enriched at the bottom (negatively correlated). NES is the Normalized Enrichment Score; NOM p-val is the nominal p-value; FDR q-val is the adjusted p-value; FWER p-val is the family-wise error rate adjusted p-value; Tag % is the percentage of genes in the core enrichment region; Gene % is the percentage of genes used in the analysis out of the total in the gene set; Lead_genes are the core genes contributing most to the ES.

Name
Term
ES
NES
NOM p-val
FDR q-val
FWER p-val
Tag %
Gene %
Lead_genes

prerank

HALLMARK_MYC_TARGETS_V1

0.7472938191195556

2.39333105644001

0.0

0.0

0.0

160/195

18.89%

RPL14;HNRNPA2B1;...

prerank

HALLMARK_OXIDATIVE_PHOSPHORYLATION

0.7431758291176868

2.376055485647371

0.0

0.0

0.0

168/200

20.44%

MDH2;COX8A;...

prerank

HALLMARK_ALLOGRAFT_REJECTION

0.744882727767552

2.3688992213810462

0.0

0.0

0.0

118/194

14.03%

ITGB2;HLA-DRA;...

prerank

...

...

...

...

...

...

...

...

...

  • Top Terms Enrichment Curve Plot: prerank_{database}:top10.pdf/png (see example below). In the plot, the sign of the Enrichment Score (ES) directly reflects the distribution pattern of the gene set in the gene list ranked by log2FC: positive ES means the gene set is concentrated at the top of the list and positively correlated with the phenotype; negative ES means the gene set is concentrated at the bottom and negatively correlated.

Last updated