Build Single-cell Reference Data (Optional)
Purpose
Use scimilarityMakeRef
to construct the SCimilarity single-cell reference database. This is only applicable to human samples.
Usage
Model download: https://zenodo.org/records/10685499
After downloading, extract the model folder and specify the --model_dir parameter.
SDAS cellAnnotation scimilarityMakeRef -o ./scimilarity_ref --reference sc.h5ad --label_key annotation \
--model_dir ./model_v1.1 \
--remove_tmp
Input Parameter Description
-o / --output
Yes
Output folder
--reference
Yes
Single-cell ref h5ad, must contain the raw expression matrix
--label_key
Yes
Name of the column in single-cell ref h5ad.obs indicating cell type
--ref_layer
No
Layer in single-cell ref h5ad storing raw counts
--ref_gene_symbol_key
No
_index
Name of the column in single-cell ref h5ad.var indicating gene symbol (_index means using h5ad.var.index)
--filter_rare_cell
No
100
The minimum cell count for a cell type to be included
--seed
No
42
Random seed
--model_dir
No
./model_v1.1
Path to the SCimilarity model folder
--ef_construction
No
1000
ef_construction for HNSW KNN algorithm; higher values make the database more accurate but more time-consuming
--M
No
80
M for HNSW KNN algorithm, controls the number of connections in the nearest neighbor graph
--remove_tmp
No
Whether to delete the CellArrDataset format data and model-generated embeddings in the output folder. Note: the original assays, cell_metadata, gene_annotation, sample_metadata, and cellsearch folders in the output will all be deleted
--gpu_id
No
-1
ID of the GPU to use. If -1, use CPU
--n_threads
No
Number of threads to use in CPU mode, defaults to all CPUs
Output Results
The constructed single-cell reference data will be stored in the scimilarity_ref
folder. The directory structure and key files are as follows:
./scimilarity_ref
├── annotation
│ ├── labelled_kNN.bin
│ └── reference_labels.tsv
├── assays
├── cell_metadata
├── cellsearch
│ └── cell_embedding
├── gene_annotation
└── sample_metadata
annotation/labelled_kNN.bin
KNN data calculated from single-cell ref embeddings, used for cell type search
annotation/reference_labels.tsv
Cell type label for each cell in the single-cell ref
assays, cell_metadata, gene_annotation, sample_metadata
CellArrDataset format data converted from single-cell ref h5ad, can be deleted. Use --remove_tmp to delete automatically after the program finishes
cellsearch/cell_embedding
Embeddings of the single-cell ref calculated by the model in tileDB format, can be deleted. Use --remove_tmp to delete automatically after the program finishes
Last updated