Cell Annotation

Purpose

Use SCimilarity for cell annotation. This is only applicable to human samples.

Usage

Model download: https://zenodo.org/records/10685499

After downloading, extract the model folder and specify the --model_dir parameter.

To use a pre-built database (it is recommended to add --cell_type_file to specify which cell types to annotate, available cell types can be found in label_ints.csv in the model folder):

SDAS cellAnnotation scimilarity -i st.h5ad -o outdir --bin_size 20 \
--model_dir ./model_v1.1 \
--cell_type_file celltype.txt

To use a single-cell reference database built with scimilarityMakeRef:

SDAS cellAnnotation scimilarity -i st.h5ad -o outdir --bin_size 20 \
--model_dir ./model_v1.1 --reference_database scimilarity_ref 

Input Parameter Description

Parameter
Rrequired
Default
Description

-i / --input

Yes

Stereo-seq h5ad, must contain the raw expression matrix

-o / --output

Yes

Output folder

--bin_size

Yes

Bin size, used to control the size of points in the plot, not used for calculation, e.g., 20, 50, 100, cellbin (equivalent to 20)

--input_layer

No

Layer in Stereo-seq h5ad storing raw counts

--input_gene_symbol_key

No

real_gene_name

Name of the column in Stereo-seq h5ad.var indicating gene symbol (index means using h5ad.var.index)

--slice_key

No

sampleID

Name of the column in multi-slice h5ad.obs indicating slice ID, used for plotting

--model_dir

No

./model_v1.1

Path to the SCimilarity model folder

--reference_database

No

Path to the single-cell database built with scimilarityMakeRef. If not specified, the pre-built database in <model_dir> will be used

--cell_type_file

No

File listing cell types to annotate, one per line. If not provided, all cell types in the model are used. Available cell types can be found in label_ints.csv in the model folder

--k_nearest_neighbor

No

50

Number of nearest cells to search

--ef

No

100

ef parameter for HNSW KNN algorithm. Higher ef gives more accurate search but is more time-consuming

--weighting

No

Whether to use distance-weighted values of the nearest K cells instead of cell counts of the nearest K cells as annotation results

--seed

No

42

Random seed

--gpu_id

No

-1

ID of the GPU to use. If -1, use CPU

--n_threads

No

Number of threads to use in CPU mode, defaults to all CPUs

Output Results

Result File
Description

<input_name>_anno_scimilarity.csv

nnotation results for each spot, including scores for each cell type

<input_name>_anno_scimilarity.h5ad

Input h5ad + annotation results. Scores for each cell type are stored in obsm['anno_score_scimilarity'], and the type with the highest score is stored in obs['anno_scimilarity']

<input_name>_anno_scimilarity.png/pdf

Overall annotation result plot; for multiple slices, one plot per slice; both png and pdf are output

<input_name>_anno_scimilarity_split.png/pdf

Separate display plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output

<input_name>_anno_score_scimilarity.png/pdf

Score plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output

For detailed explanations and specific result displays, please refer to the following link (cell2location algorithm → cell annotation → output results).

Last updated