Cell Annotation
Purpose
Use SCimilarity for cell annotation. This is only applicable to human samples.
Usage
Model download: https://zenodo.org/records/10685499
After downloading, extract the model folder and specify the --model_dir parameter.
To use a pre-built database (it is recommended to add --cell_type_file to specify which cell types to annotate, available cell types can be found in label_ints.csv in the model folder):
SDAS cellAnnotation scimilarity -i st.h5ad -o outdir --bin_size 20 \
--model_dir ./model_v1.1 \
--cell_type_file celltype.txt
To use a single-cell reference database built with scimilarityMakeRef:
SDAS cellAnnotation scimilarity -i st.h5ad -o outdir --bin_size 20 \
--model_dir ./model_v1.1 --reference_database scimilarity_ref
Input Parameter Description
-i / --input
Yes
Stereo-seq h5ad, must contain the raw expression matrix
-o / --output
Yes
Output folder
--bin_size
Yes
Bin size, used to control the size of points in the plot, not used for calculation, e.g., 20, 50, 100, cellbin (equivalent to 20)
--input_layer
No
Layer in Stereo-seq h5ad storing raw counts
--input_gene_symbol_key
No
real_gene_name
Name of the column in Stereo-seq h5ad.var indicating gene symbol (index means using h5ad.var.index)
--slice_key
No
sampleID
Name of the column in multi-slice h5ad.obs indicating slice ID, used for plotting
--model_dir
No
./model_v1.1
Path to the SCimilarity model folder
--reference_database
No
Path to the single-cell database built with scimilarityMakeRef. If not specified, the pre-built database in <model_dir> will be used
--cell_type_file
No
File listing cell types to annotate, one per line. If not provided, all cell types in the model are used. Available cell types can be found in label_ints.csv in the model folder
--k_nearest_neighbor
No
50
Number of nearest cells to search
--ef
No
100
ef parameter for HNSW KNN algorithm. Higher ef gives more accurate search but is more time-consuming
--weighting
No
Whether to use distance-weighted values of the nearest K cells instead of cell counts of the nearest K cells as annotation results
--seed
No
42
Random seed
--gpu_id
No
-1
ID of the GPU to use. If -1, use CPU
--n_threads
No
Number of threads to use in CPU mode, defaults to all CPUs
Output Results
<input_name>_anno_scimilarity.csv
nnotation results for each spot, including scores for each cell type
<input_name>_anno_scimilarity.h5ad
Input h5ad + annotation results. Scores for each cell type are stored in obsm['anno_score_scimilarity'], and the type with the highest score is stored in obs['anno_scimilarity']
<input_name>_anno_scimilarity.png/pdf
Overall annotation result plot; for multiple slices, one plot per slice; both png and pdf are output
<input_name>_anno_scimilarity_split.png/pdf
Separate display plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output
<input_name>_anno_score_scimilarity.png/pdf
Score plot for each cell type; for multiple slices, one plot per slice; both png and pdf are output
For detailed explanations and specific result displays, please refer to the following link (cell2location algorithm → cell annotation → output results).
Last updated