Cell Annotation

Purpose

Use cell2location for deconvolution-based cell annotation.

Usage

SDAS cellAnnotation cell2location -i st.h5ad -o outdir --reference_csv ./ref/inf_aver.csv --bin_size 20 \
--input_gene_symbol_key _index \
--gpu_id 3

Input Parameter Description

Parameter

Required

Default

Description

-i / --input

Yes

Stereo-seq h5ad, must contain the raw expression matrix

-o / --output

Yes

Output folder

--reference_csv

Yes

Single-cell reference csv file

--bin_size

Yes

Bin size, used to control the number of cells per bin and the size of points in the plot; e.g., 20, 50, 100, cellbin (equivalent to 20)

--input_layer

Layer in Stereo-seq h5ad storing raw counts

--input_gene_symbol_key

real_gene_name

Name of the column in Stereo-seq h5ad.var indicating gene symbol (index means using h5ad.var.index)

--slice_key

sampleID

Name of the column in multi-slice h5ad.obs indicating slice ID, provides batch information and is used for plotting

--detection_alpha

Regularization parameter. The larger the technical variation in spatial data, the smaller the suitable detection_alpha; usually not adjusted

--data_split_strategy

chunk

When the number of bins is too large, spatial data is split; this parameter specifies the data splitting strategy. 'chunk' means random splitting before running cell2location, 'batch' means splitting within the algorithm

--data_split_size

10000

When the number of bins is too large, spatial data is split; this parameter specifies the split data size. Larger values run faster but use more GPU memory. If -1, no splitting is performed

--max_epochs

5000

Number of epochs for model training

--seed

Random seed

--gpu_id

-1

ID of the GPU to use. If -1, use CPU. This parameter only specifies the main GPU to use; other GPUs may also be occupied but with very low usage. If you need to strictly specify the GPU, set the environment variable before running, e.g.: export CUDA_VISIBLE_DEVICES=2, then set --gpu_id 0 to use only GPU 2.

--n_threads

Number of threads to use in CPU mode, defaults to all CPUs

Output Results

Result File

Description

<input_name>_anno_cell2location.csv

Annotation results for each spot, including scores for each cell type (cell2location's q05_cell_abundance_w_sf score)

<input_name>_anno_cell2location.h5ad

Input h5ad + annotation results. Scores for each cell type are stored in obsm['anno_score_cell2location'], and the type with the highest score is stored in obs['anno_cell2location']

<input_name>_anno_cell2location.png/pdf

Overall annotation result plot; for multiple slices, one plot per slice; both png and pdf are output

<input_name>_anno_cell2location_split.png/pdf

Separate display plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output

<input_name>_anno_score_cell2location.png/pdf

Score plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output

Overall Annotation Result Plot: <input_name>_anno_cell2location.png/pdf The color represents the cell type with the highest proportion in each bin/cellbin.

Separate Display Plot for Each Cell Type: <input_name>_anno_cell2location_split.png/pdf The color represents the cell type with the highest proportion in each bin/cellbin; the title is the cell type (number of cells).

Cell Type Score Plot: <input_name>_anno_score_cell2location.png/pdf The scores for different cell types calculated by the algorithm. The higher the score, the higher the proportion of that cell type.

Annotation Result CSV: <input_name>_anno_cell2location.csv Each row is a bin/cellbin, each column is a cell type, and the value is the cell type score. The higher the score, the higher the proportion of that cell type. The last column (annotation) is the cell type with the highest proportion in that bin/cellbin.

index

B_act

B_naive

CD4_CXCL13

...

annotation

CRCP95_T_BIN.242

0.1689

0.1694

0.2176

...

CAF_CXCL14

CRCP95_T_BIN.243

0.1122

0.2350

0.1745

...

Epi

CRCP95_T_BIN.244

0.1020

0.2062

0.1527

...

Epi

CRCP95_T_BIN.245

0.0808

0.1980

0.1668

...

Epi

...

PreviousBuild Single-cell Reference Data NextRCTD Algorithm

Last updated 5 months ago

hashtagPurpose

hashtagUsage

hashtagInput Parameter Description

hashtagOutput Results

Purpose

Usage

Input Parameter Description

Output Results