GSVA Algorithm
Purpose and Usage
Scenario 1: Perform GSVA analysis on specific objects, e.g., specify clusters 1, 2, and 3 in leiden clustering for GSVA analysis
SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \ --group_key leiden --idents 1,2,3 --species human
Scenario 2: Subset a column in obs before GSVA analysis, e.g., perform GSVA analysis only on different samples of a certain cell type
SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \ --group_key leiden --idents 1,2,3 --species human \ --subset_key cell_type --subset_values B
Scenario 3: Analyze only with databases of interest
SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \ --group_key leiden --idents 1,2,3 \ --gmt sdas_deg_enrichment/lib/GSEADB/h.all.v2024.1.Hs.symbols.gmt,sdas_deg_enrichment/lib/GSEADB/KEGG_2021_Human.gmt
Scenario 4: Perform GSVA analysis on all elements of a column in obs together. To analyze all elements in a column together, set the
--idents
parameter to the special value all.SDAS geneSetEnrichment gsva -i st.h5ad -o outdir \ --group_key leiden --idents all --species human \
Input Parameter Description
-i / --input
Yes
Input a spatial h5ad file.
--group_key
Yes
Identifier name in h5ad obs, must contain ident1 and ident2.
--idents
Yes
Regroup cells into a different identity class prior to performing GSVA, eg. sample1,sample2, default: all
-o / --output
Yes
The GSEApy output directory. Default: the current working directory
--subset_key
No
Key for subsetting (optional), eg. cell_type
--subset_values
No
Values used for subsetting (optional), eg. cell1,cell2
--layer
No
set gene raw expression layer, adata.raw.X or adata.X will be used if set None . default: None
--gene_symbol_key
No
real_gene_name
set gene name, default: real_gene_name
--species
No
human
Use biuld-in gmt database: human or mouse. Default: human. More database see here: https://amp.pharm.mssm.edu/modEnrichr.
--sample_size
No
0
Random sample cells number, 0 for not. Default: 0
--gmt
No
Customized gene set database in GMT format. One or more databases split by ",". Default use --species build-in database.
--kernel_cdf
No
Gaussian
Gaussian is suitable when input expression values are continuous. If input integer counts, then this argument should be set to 'Poisson'
--mx_diff
No
When set, ES is calculated as the maximum distance of the random walk from 0. Default: False
--abs_ranking
No
Flag used only when --mx-diff is not set. When set, the original Kuiper statistic is used
--min_size
No
15
Min size of input genes presented in Gene Sets. Default: 15
--max_size
No
20000
Max size of input genes presented in Gene Sets. Default: 20000
--weight
No
1
tau in the random walk performed by the gsva. Default: 1
--seed
No
123
Number of random seed. Default: 123
--threads
No
1
Number of Processes you are going to use. Default: 1
Output Results Display
GSVA.{database}.csv
Result file in csv format
GSVA.{database}.pdf/png
Result heatmap in pdf and png formats
GSVA csv file format:
GSVA.{database}.csv
is the GSVA analysis result file. The first column is Term (function name), and each subsequent column represents a sample. Positive values indicate higher activity for that function in the sample, negative values indicate lower activity.
HALLMARK_ADIPOGENESIS
-0.32809425650271146
-0.306805475112318
....
HALLMARK_ALLOGRAFT_REJECTION
-0.3052190348950549
0.22055475913564931
....
HALLMARK_ANDROGEN_RESPONSE
-0.39290236695613107
-0.3080397441881526
....
...
...
...
...
GSVA result heatmap:
GSVA.{database}.pdf/png
. The vertical axis is the function/pathway name, the horizontal axis is the sample name, and the legend shows the GSVA score.

Last updated