NeST Algorithm
Purpose
Use the NeST algorithm to identify spatial gene co-expression gene sets.
Usage
SDAS coexpress nest -i st.h5ad -o outdir --bin_size 100 \
--layer raw_counts \
--selected_genes top5000 \
--moran_path ./moran.csv \
--n_cpus 8 \
--seed 42 \
--hotspot_min_size 30 \
--hotspot_min_samples 4 \
--min_cells 100
Input Parameter Description
-i / --input
Yes
Stereo-seq h5ad, must contain raw expression matrix
-o / --output
Yes
Output directory
--bin_size
Yes
50
Bin size for resolution (20, 50, 100, 200, cellbin), consistent with input h5ad, required for plotting and calculation
--layer
No
Specify the layer of the raw expression matrix in h5ad (e.g. layers['raw_counts'])
--selected_genes
No
top5000
Selected gene mode : full(all the genes), topn(top n genes in Moran'I index)
--moran_path
No
Path to the precomputed Moran'I index csv file
--n_cpus
No
8
Number of parallel jobs for a speedup on multi-core machines
--seed
No
42
Random seed
--hotspot_min_size
No
30
single_hotspot: Minimum number of spots/cells to form a single-gene hotspot
--hotspot_min_samples
No
4
single_hotspot: Minimum number of neighboring spots/cells covered by DBSCAN
--min_cells
No
100/30
coexpress_hotspot: Minimum number of spots/cells to form a module in Module QC
default: 100 for cellbin/bin20/bin50; 30 for bin100/bin200
Output Results Display
<input_name>_nest.module.csv
The result csv of spatial highly variable genes (gene symbol+gene id) corresponding to the co-expression gene set (module)
<input_name>_nest.h5ad
h5ad file containing the results of co-expression gene sets (adata.obsm['module_score_nest'])
<input_name>_nest_module_score_nest.png/pdf
Spatial heatmap of gene set scoring for co-expression gene sets
<input_name>_nest.all_coex_hotspots/_nest.all_coex_structure.png/pdf
Spatial location and hierarchical structure of co-expression gene sets
<input_name>_nest.separate_coex_hotspots.png/pdf
Spatial location and gene count of co-expression gene sets
<input_name>_nest.moran.csv
If topn is used for calculation, outputs all gene Moran index and P values
Result csv of co-expression gene sets:
<input_name>_nest.module.csv
, separated by commas. NeST output shows the spatial highly variable genes identified and their corresponding co-expression gene sets (modules)
Module0
ENSG00000130649
EPAS1
Module0
ENSG00000102882
CHCHD3
Module0
ENSG00000179144
MDGA2
Spatial heatmap of gene set scoring for co-expression gene sets
<input_name>_nest_module_score_nest.png
: Visualizes the spatial distribution patterns of all co-expression gene sets (Modules). The color intensity in the figure indicates the expression level of the co-expression gene set.

Spatial location and hierarchical structure of co-expression gene sets
<input_name>_nest.all_coex_hotspots.png/pdf; <input_name>_nest.all_coex_structure.png/pdf
: Shows the hierarchical relationship between different co-expression gene sets (Modules). The color in the figure indicates the spatial regions where different co-expression gene sets are located.

Spatial location and gene count of co-expression gene sets
<input_name>_nest.separate_coex_hotspots.png/pdf
: Visualizes the spatial regions where all co-expression gene sets (Modules) are located and the number of genes contained.

Result Interpretation
The co-expression gene sets start from Module0. No Module means genes that do not meet the clustering requirements of co-expression gene sets.
Parameter Tuning Suggestions
If the number of genes in bin20/50 samples is less than 200, or for other special samples, and the identified spatial co-expression gene sets are few, it is recommended to lower
hotspot_min_size
to 10.If the identified spatial co-expression gene sets are few, it is recommended to lower
min_cells
to 10.If the identified patterns are too fine and "NumPy Unable to allocate X GiB array" error occurs, it is recommended to increase
hotspot_min_size
andhotspot_min_samples
.
Last updated