GSVA Algorithm

Purpose and Usage

  • Scenario 1: Perform GSVA analysis on specific objects, e.g., specify clusters 1, 2, and 3 in leiden clustering for GSVA analysis

    SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
    --group_key leiden --idents 1,2,3 --species human
  • Scenario 2: Subset a column in obs before GSVA analysis, e.g., perform GSVA analysis only on different samples of a certain cell type

    SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
    --group_key leiden --idents 1,2,3 --species human \
    --subset_key cell_type --subset_values B
  • Scenario 3: Analyze only with databases of interest

    SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
    --group_key leiden --idents 1,2,3 \
    --gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt
  • Scenario 4: Perform GSVA analysis on all elements of a column in obs together. To analyze all elements in a column together, set the --idents parameter to the special value all.

    SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
    --group_key leiden --idents all --species human \

Input Parameter Description

GSVA Parameter
Required
Default Value
Description

-i / --input

Yes

Input a spatial h5ad file.

--group_key

Yes

Identifier name in h5ad obs, must contain ident1 and ident2.

--idents

Yes

Regroup cells into a different identity class prior to performing GSVA, eg. sample1,sample2, default: all

-o / --output

Yes

The GSEApy output directory. Default: the current working directory

--subset_key

No

Key for subsetting (optional), eg. cell_type

--subset_values

No

Values used for subsetting (optional), eg. cell1,cell2

--layer

No

set gene raw expression layer, adata.raw.X or adata.X will be used if set None . default: None

--gene_symbol_key

No

real_gene_name

set gene name, default: real_gene_name

--species

No

human

Use biuld-in gmt database: human or mouse. Default: human. More database see here: https://amp.pharm.mssm.edu/modEnrichr.

--sample_size

No

0

Random sample cells number, 0 for not. Default: 0

--gmt

No

Customized gene set database in GMT format. One or more databases split by ",". Default use --species build-in database.

--kernel_cdf

No

Gaussian

Gaussian is suitable when input expression values are continuous. If input integer counts, then this argument should be set to 'Poisson'

--mx_diff

No

When set, ES is calculated as the maximum distance of the random walk from 0. Default: False

--abs_ranking

No

Flag used only when --mx-diff is not set. When set, the original Kuiper statistic is used

--min_size

No

15

Min size of input genes presented in Gene Sets. Default: 15

--max_size

No

20000

Max size of input genes presented in Gene Sets. Default: 20000

--weight

No

1

tau in the random walk performed by the gsva. Default: 1

--seed

No

123

Number of random seed. Default: 123

--threads

No

1

Number of Processes you are going to use. Default: 1

Output Results Display

GSVA Result File
Description

GSVA.{database}.csv

Result file in csv format

GSVA.{database}.pdf/png

Result heatmap in pdf and png formats

  • GSVA csv file format: GSVA.{database}.csv is the GSVA analysis result file. The first column is Term (function name), and each subsequent column represents a sample. Positive values indicate higher activity for that function in the sample, negative values indicate lower activity.

Term
ident1
ident2
...

HALLMARK_ADIPOGENESIS

-0.32809425650271146

-0.306805475112318

....

HALLMARK_ALLOGRAFT_REJECTION

-0.3052190348950549

0.22055475913564931

....

HALLMARK_ANDROGEN_RESPONSE

-0.39290236695613107

-0.3080397441881526

....

...

...

...

...

  • GSVA result heatmap: GSVA.{database}.pdf/png. The vertical axis is the function/pathway name, the horizontal axis is the sample name, and the legend shows the GSVA score.

Last updated