Cell Annotation

Purpose

Use cell2location for deconvolution-based cell annotation.

Usage

SDAS cellAnnotation cell2location -i st.h5ad -o outdir --reference_csv ./ref/inf_aver.csv --bin_size 20 \
--input_gene_symbol_key _index \
--gpu_id 3

Input Parameter Description

Parameter
Required
Default
Description

-i / --input

Yes

Stereo-seq h5ad, must contain the raw expression matrix

-o / --output

Yes

Output folder

--reference_csv

Yes

Single-cell reference csv file

--bin_size

Yes

Bin size, used to control the number of cells per bin and the size of points in the plot; e.g., 20, 50, 100, cellbin (equivalent to 20)

--input_layer

No

Layer in Stereo-seq h5ad storing raw counts

--input_gene_symbol_key

No

real_gene_name

Name of the column in Stereo-seq h5ad.var indicating gene symbol (index means using h5ad.var.index)

--slice_key

No

sampleID

Name of the column in multi-slice h5ad.obs indicating slice ID, provides batch information and is used for plotting

--detection_alpha

No

20

Regularization parameter. The larger the technical variation in spatial data, the smaller the suitable detection_alpha; usually not adjusted

--data_split_strategy

No

chunk

When the number of bins is too large, spatial data is split; this parameter specifies the data splitting strategy. 'chunk' means random splitting before running cell2location, 'batch' means splitting within the algorithm

--data_split_size

No

10000

When the number of bins is too large, spatial data is split; this parameter specifies the split data size. Larger values run faster but use more GPU memory. If -1, no splitting is performed

--max_epochs

No

5000

Number of epochs for model training

--seed

No

42

Random seed

--gpu_id

No

-1

ID of the GPU to use. If -1, use CPU. This parameter only specifies the main GPU to use; other GPUs may also be occupied but with very low usage. If you need to strictly specify the GPU, set the environment variable before running, e.g.: export CUDA_VISIBLE_DEVICES=2, then set --gpu_id 0 to use only GPU 2.

--n_threads

No

Number of threads to use in CPU mode, defaults to all CPUs

Output Results

Result File
Description

<input_name>_anno_cell2location.csv

Annotation results for each spot, including scores for each cell type (cell2location's q05_cell_abundance_w_sf score)

<input_name>_anno_cell2location.h5ad

Input h5ad + annotation results. Scores for each cell type are stored in obsm['anno_score_cell2location'], and the type with the highest score is stored in obs['anno_cell2location']

<input_name>_anno_cell2location.png/pdf

Overall annotation result plot; for multiple slices, one plot per slice; both png and pdf are output

<input_name>_anno_cell2location_split.png/pdf

Separate display plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output

<input_name>_anno_score_cell2location.png/pdf

Score plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output

  • Overall Annotation Result Plot: <input_name>_anno_cell2location.png/pdf The color represents the cell type with the highest proportion in each bin/cellbin.

  • Separate Display Plot for Each Cell Type: <input_name>_anno_cell2location_split.png/pdf The color represents the cell type with the highest proportion in each bin/cellbin; the title is the cell type (number of cells).

  • Cell Type Score Plot: <input_name>_anno_score_cell2location.png/pdf The scores for different cell types calculated by the algorithm. The higher the score, the higher the proportion of that cell type.

  • Annotation Result CSV: <input_name>_anno_cell2location.csv Each row is a bin/cellbin, each column is a cell type, and the value is the cell type score. The higher the score, the higher the proportion of that cell type. The last column (annotation) is the cell type with the highest proportion in that bin/cellbin.

index
B_act
B_naive
CD4_CXCL13
...
annotation

CRCP95_T_BIN.242

0.1689

0.1694

0.2176

...

CAF_CXCL14

CRCP95_T_BIN.243

0.1122

0.2350

0.1745

...

Epi

CRCP95_T_BIN.244

0.1020

0.2062

0.1527

...

Epi

CRCP95_T_BIN.245

0.0808

0.1980

0.1668

...

Epi

...

...

...

...

...

...

Last updated