hdWGCNA Algorithm

Purpose

Use the hdWGCNA algorithm to identify spatial gene co-expression gene sets.

Usage

SDAS coexpress hdwgcna -i st.h5ad -o outdir --bin_size 100 \
--input_layer raw_counts \
--selected_genes top5000  \
--moran_path ./moran.csv \
--n_cpus 8 \
--seed 42 \
--knn_neighbors 50  \
--max_shared_cells 15 \
--soft_power 8

Input Parameter Description

Parameter
Required
Default
Description

-i / --input

Yes

Stereo-seq h5ad, must contain raw expression matrix

-o / --output

Yes

Output directory

--bin_size

Yes

50

Bin size for resolution (20, 50, 100, 200, cellbin), consistent with input h5ad

--layer

No

Specify the layer of the raw expression matrix in h5ad (e.g. layers['raw_counts'])

--selected_genes

No

top5000

Selected gene mode : full(all the genes), topn(top n genes in Moran'I index)

--moran_path

No

Path to the precomputed Moran'I index csv file

--n_cpus

No

8

Number of parallel jobs for a speedup on multi-core machines

--seed

No

42

Random seed

--knn_neighbors

No

50

construct metacells: Number of neighboring cells covered by KNN algorithm

--max_shared_cells

No

15

construct metacells:maximum number of shared cells between two metacells

--soft_power

No

Used in network construction, by default automatically selects the lowest soft_power with a scale-free topology model fit of 0.8

Output Results Display

Result File
Description

<input_name>_hdwgcna.module.csv

The result csv of spatial highly variable genes (gene symbol+gene id) corresponding to the co-expression gene set (module)

<input_name>_hdwgcna.module_score.csv

The result csv of gene set scoring for co-expression gene sets

<input_name>_hdwgcna.coexpress.rds

rds file containing the results of co-expression gene sets

<input_name>_hdwgcna.module_score.png/pdf

Spatial heatmap of gene set scoring for co-expression gene sets

<input_name>_hdwgcna.all_coex_dendrogram.png/pdf

Dendrogram of similarity between co-expression gene sets

<input_name>_hdwgcna.softpowers.png/pdf

Bar chart of soft_power values for network construction

<input_name>_hdwgcna.moran.csv

If topn is used for calculation, outputs all gene Moran index and P values

  • Result csv of co-expression gene sets: <input_name>_hdwgcna.module.csv, separated by commas. hdWGCNA output shows the spatial highly variable genes identified and their corresponding co-expression gene sets (modules). kME indicates the correlation strength between a gene's expression pattern and the module eigengene (Module Eigengene, ME) of the module it belongs to. The closer the kME value is to 1 or -1, the more likely the gene is a hub gene.

real_gene_name
geneid
Module
color
kME_Module1
kME_Module2
kME_grey
kME_Module3
kME_Module4
kME_Module5
kME_Module6
kME_Module7
kME_Module8
kME_Module9

A2M

ENSG00000175899

Module1

green

0.47946868988301

-0.107096403482606

-0.178114022165641

0.0676792398874597

0.095966109797419

-0.0907050325056857

-0.0529390531160642

-0.150612945887371

0.0878907827651177

0.0249952108382643

A2M-AS1

ENSG00000237094

Module1

green

0.54370397007705

-0.150011910577089

-0.254597937099371

0.0926882061841318

0.140032173496191

-0.115227951266487

-0.101675353602963

-0.222107282189061

0.0803636102659976

0.0426306888623326

A2ML1

ENSG00000166535

Module2

yellow

0.0404144692736028

0.479908573141937

0.194701680726881

-0.327610748128114

0.0430624759042059

0.429681007497005

-0.342984504779987

0.145625804577339

-0.386999928188458

0.08281144751312791

A2MP1

ENSG00000256069

grey

grey

-0.046660656715667

0.20294339804614

0.284819067476003

-0.0506850476403686

-0.205976941174478

0.244779685854094

0.000250607520833238

0.170101997387916

-0.0177549796818324

0.0639042087827032

  • Result csv of gene set scoring for co-expression gene sets: <input_name>_hdwgcna.module_score.csv, separated by commas. hdWGCNA output shows the high and low expression scores of each co-expression gene set (module).

Module6
Module3
Module8
Module2
grey
Module7
Module5
Module9
Module1
Module4

2200_16100

-3.23688863476392

-4.34756288337066

-2.3278151796256

-8.21694142422341

-14.8112682710791

-9.12253218247156

-10.174563894144

-3.09447240000024

0.481660736850741

3.91787079378259

2200_17200

5.77873502485046

0.783016254503074

1.06582091429724

-6.03050203635639

-3.71256039305597

-0.825856084852031

-3.67468239887104

-2.09159016878048

-2.639251117267012

5.41583186417414

2300_16700

7.90521666109811

2.93759207152763

-0.391450035802177

-3.02639637030598

1.63013439679168

1.66371621513915

-1.51360146647437

-0.8975499248414

-4.66703690157902

1.40723191567521

  • Spatial heatmap of gene set scoring for co-expression gene sets <input_name>_hdwgcna.module_score.png/pdf: Visualizes the spatial distribution patterns of all co-expression gene sets (Modules). The color intensity in the figure indicates the expression level of the co-expression gene set.

  • Bar chart of soft_power values for network construction <input_name>_hdwgcna.softpowers.png/pdf: Analyzes the effect of different soft_power parameters on network construction. By default, the lowest soft_power with a scale-free topology model fit of 0.8 is automatically selected.

  • Dendrogram of similarity between co-expression gene sets <input_name>_hdwgcna.all_coex_dendrogram.png/pdf: Shows the hierarchical clustering dendrogram of similarity between different co-expression gene sets (Modules).

Result Interpretation

  • The co-expression gene sets start from Module1, and grey represents genes that do not meet the clustering requirements of co-expression gene sets.

Parameter Tuning Suggestions

  • If the number of genes in bin20/50 samples is less than 200, or for other special samples, and the identified spatial co-expression gene sets are few, you can try lowering the threshold according to the soft_power test chart.

  • You can customize the parameters knn_neighbors and max_shared_cells to obtain more interpretable results.

Last updated