Cell Annotation
Purpose
Use cell2location
for deconvolution-based cell annotation.
Usage
SDAS cellAnnotation cell2location -i st.h5ad -o outdir --reference_csv ./ref/inf_aver.csv --bin_size 20 \
--input_gene_symbol_key _index \
--gpu_id 3
Input Parameter Description
-i / --input
Yes
Stereo-seq h5ad, must contain the raw expression matrix
-o / --output
Yes
Output folder
--reference_csv
Yes
Single-cell reference csv file
--bin_size
Yes
Bin size, used to control the number of cells per bin and the size of points in the plot; e.g., 20, 50, 100, cellbin (equivalent to 20)
--input_layer
No
Layer in Stereo-seq h5ad storing raw counts
--input_gene_symbol_key
No
real_gene_name
Name of the column in Stereo-seq h5ad.var indicating gene symbol (index means using h5ad.var.index)
--slice_key
No
sampleID
Name of the column in multi-slice h5ad.obs indicating slice ID, provides batch information and is used for plotting
--detection_alpha
No
20
Regularization parameter. The larger the technical variation in spatial data, the smaller the suitable detection_alpha; usually not adjusted
--data_split_strategy
No
chunk
When the number of bins is too large, spatial data is split; this parameter specifies the data splitting strategy. 'chunk' means random splitting before running cell2location, 'batch' means splitting within the algorithm
--data_split_size
No
10000
When the number of bins is too large, spatial data is split; this parameter specifies the split data size. Larger values run faster but use more GPU memory. If -1, no splitting is performed
--max_epochs
No
5000
Number of epochs for model training
--seed
No
42
Random seed
--gpu_id
No
-1
ID of the GPU to use. If -1, use CPU. This parameter only specifies the main GPU to use; other GPUs may also be occupied but with very low usage. If you need to strictly specify the GPU, set the environment variable before running, e.g.: export CUDA_VISIBLE_DEVICES=2, then set --gpu_id 0 to use only GPU 2.
--n_threads
No
Number of threads to use in CPU mode, defaults to all CPUs
Output Results
<input_name>_anno_cell2location.csv
Annotation results for each spot, including scores for each cell type (cell2location's q05_cell_abundance_w_sf score)
<input_name>_anno_cell2location.h5ad
Input h5ad + annotation results. Scores for each cell type are stored in obsm['anno_score_cell2location'], and the type with the highest score is stored in obs['anno_cell2location']
<input_name>_anno_cell2location.png/pdf
Overall annotation result plot; for multiple slices, one plot per slice; both png and pdf are output
<input_name>_anno_cell2location_split.png/pdf
Separate display plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output
<input_name>_anno_score_cell2location.png/pdf
Score plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output
Overall Annotation Result Plot:
<input_name>_anno_cell2location.png/pdf
The color represents the cell type with the highest proportion in each bin/cellbin.

Separate Display Plot for Each Cell Type:
<input_name>_anno_cell2location_split.png/pdf
The color represents the cell type with the highest proportion in each bin/cellbin; the title is the cell type (number of cells).

Cell Type Score Plot:
<input_name>_anno_score_cell2location.png/pdf
The scores for different cell types calculated by the algorithm. The higher the score, the higher the proportion of that cell type.

Annotation Result CSV:
<input_name>_anno_cell2location.csv
Each row is a bin/cellbin, each column is a cell type, and the value is the cell type score. The higher the score, the higher the proportion of that cell type. The last column (annotation) is the cell type with the highest proportion in that bin/cellbin.
CRCP95_T_BIN.242
0.1689
0.1694
0.2176
...
CAF_CXCL14
CRCP95_T_BIN.243
0.1122
0.2350
0.1745
...
Epi
CRCP95_T_BIN.244
0.1020
0.2062
0.1527
...
Epi
CRCP95_T_BIN.245
0.0808
0.1980
0.1668
...
Epi
...
...
...
...
...
...
Last updated