GSVA Algorithm

Purpose and Usage

Scenario 1: Perform GSVA analysis on specific objects, e.g., specify clusters 1, 2, and 3 in leiden clustering for GSVA analysis
```
SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
--group_key leiden --idents 1,2,3 --species human
```

Scenario 2: Subset a column in obs before GSVA analysis, e.g., perform GSVA analysis only on different samples of a certain cell type

SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
--group_key leiden --idents 1,2,3 --species human \
--subset_key cell_type --subset_values B

Scenario 3: Analyze only with databases of interest

SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
--group_key leiden --idents 1,2,3 \
--gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt

Scenario 4: Perform GSVA analysis on all elements of a column in obs together. To analyze all elements in a column together, set the --idents parameter to the special value all.
```
SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \
--group_key leiden --idents all --species human \
```

Input Parameter Description

GSVA Parameter

Required

Default Value

Description

-i / --input

Yes

Input a spatial h5ad file.

--group_key

Yes

Identifier name in h5ad obs, must contain ident1 and ident2.

--idents

Yes

Regroup cells into a different identity class prior to performing GSVA, eg. sample1,sample2, default: all

-o / --output

Yes

The GSEApy output directory. Default: the current working directory

--subset_key

Key for subsetting (optional), eg. cell_type

--subset_values

Values used for subsetting (optional), eg. cell1,cell2

--layer

set gene raw expression layer, adata.raw.X or adata.X will be used if set None . default: None

--gene_symbol_key

real_gene_name

set gene name, default: real_gene_name

--species

human

Use biuld-in gmt database: human or mouse. Default: human. More database see here: https://amp.pharm.mssm.edu/modEnrichr.

--sample_size

Random sample cells number, 0 for not. Default: 0

--gmt

Customized gene set database in GMT format. One or more databases split by ",". Default use --species build-in database.

--kernel_cdf

Gaussian

Gaussian is suitable when input expression values are continuous. If input integer counts, then this argument should be set to 'Poisson'

--mx_diff

When set, ES is calculated as the maximum distance of the random walk from 0. Default: False

--abs_ranking

Flag used only when --mx-diff is not set. When set, the original Kuiper statistic is used

--min_size

Min size of input genes presented in Gene Sets. Default: 15

--max_size

20000

Max size of input genes presented in Gene Sets. Default: 20000

--weight

tau in the random walk performed by the gsva. Default: 1

--seed

123

Number of random seed. Default: 123

--threads

Number of Processes you are going to use. Default: 1

Output Results Display

GSVA Result File

Description

GSVA.{database}.csv

Result file in csv format

GSVA.{database}.pdf/png

Result heatmap in pdf and png formats

GSVA csv file format: GSVA.{database}.csv is the GSVA analysis result file. The first column is Term (function name), and each subsequent column represents a sample. Positive values indicate higher activity for that function in the sample, negative values indicate lower activity.

Term

ident1

ident2

...

HALLMARK_ADIPOGENESIS

-0.32809425650271146

-0.306805475112318

....

HALLMARK_ALLOGRAFT_REJECTION

-0.3052190348950549

0.22055475913564931

....

HALLMARK_ANDROGEN_RESPONSE

-0.39290236695613107

-0.3080397441881526

....

...

GSVA result heatmap: GSVA.{database}.pdf/png. The vertical axis is the function/pathway name, the horizontal axis is the sample name, and the legend shows the GSVA score.

PreviousGSEA Algorithm NextEnrichr Algorithm

Last updated 2 months ago