Build Single-cell Reference Data (Optional)

Purpose

Use scimilarityMakeRef to construct the SCimilarity single-cell reference database. This is only applicable to human samples.

Usage

Model download: https://zenodo.org/records/10685499

After downloading, extract the model folder and specify the --model_dir parameter.

SDAS cellAnnotation scimilarityMakeRef -o ./scimilarity_ref --reference sc.h5ad --label_key annotation \
--model_dir ./model_v1.1 \
--remove_tmp

Input Parameter Description

Parameter
Required
Default
Description

-o / --output

Yes

Output folder

--reference

Yes

Single-cell ref h5ad, must contain the raw expression matrix

--label_key

Yes

Name of the column in single-cell ref h5ad.obs indicating cell type

--ref_layer

No

Layer in single-cell ref h5ad storing raw counts

--ref_gene_symbol_key

No

_index

Name of the column in single-cell ref h5ad.var indicating gene symbol (_index means using h5ad.var.index)

--filter_rare_cell

No

100

The minimum cell count for a cell type to be included

--seed

No

42

Random seed

--model_dir

No

./model_v1.1

Path to the SCimilarity model folder

--ef_construction

No

1000

ef_construction for HNSW KNN algorithm; higher values make the database more accurate but more time-consuming

--M

No

80

M for HNSW KNN algorithm, controls the number of connections in the nearest neighbor graph

--remove_tmp

No

Whether to delete the CellArrDataset format data and model-generated embeddings in the output folder. Note: the original assays, cell_metadata, gene_annotation, sample_metadata, and cellsearch folders in the output will all be deleted

--gpu_id

No

-1

ID of the GPU to use. If -1, use CPU

--n_threads

No

Number of threads to use in CPU mode, defaults to all CPUs

Output Results

The constructed single-cell reference data will be stored in the scimilarity_ref folder. The directory structure and key files are as follows:

./scimilarity_ref
├── annotation
│   ├── labelled_kNN.bin
│   └── reference_labels.tsv
├── assays
├── cell_metadata
├── cellsearch
│   └── cell_embedding
├── gene_annotation
└── sample_metadata
Result File
Description

annotation/labelled_kNN.bin

KNN data calculated from single-cell ref embeddings, used for cell type search

annotation/reference_labels.tsv

Cell type label for each cell in the single-cell ref

assays, cell_metadata, gene_annotation, sample_metadata

CellArrDataset format data converted from single-cell ref h5ad, can be deleted. Use --remove_tmp to delete automatically after the program finishes

cellsearch/cell_embedding

Embeddings of the single-cell ref calculated by the model in tileDB format, can be deleted. Use --remove_tmp to delete automatically after the program finishes

Last updated