Build Single-cell Reference Data
Purpose
Use cell2locationMakeRef
to construct the cell2location single-cell reference inf_aver.csv file.
Usage
SDAS cellAnnotation cell2locationMakeRef -o ./ref --reference sc.h5ad --label_key annotation \
--batch_key id \
--nonz_mean_cutoff 1.45 \
--gpu_id 3
Input Parameter Description
-o / --output
Yes
Output folder
--reference
Yes
Single-cell ref h5ad, must contain the raw expression matrix
--label_key
Yes
Name of the column in single-cell ref h5ad.obs indicating cell type
--ref_layer
No
Layer in single-cell ref h5ad storing raw counts
--ref_gene_symbol_key
No
_index
Name of the column in single-cell ref h5ad.var indicating gene symbol (_index means using h5ad.var.index)
--batch_key
No
Name of the column in single-cell ref h5ad.obs indicating batch; if not provided, batch is not considered
--filter_rare_cell
No
100
The minimum cell count for a cell type to be included
--check_filter_genes
No
If this parameter is set, only the result plot of filtered genes (filter_genes.png) will be output
--cell_count_cutoff
No
5
Parameter controlling gene filtering in cell2location, usually not adjusted
--cell_percentage_cutoff2
No
0.03
Parameter controlling gene filtering in cell2location; the larger the value, the fewer genes are selected. It is recommended to keep the number of genes between 8k-16k
--nonz_mean_cutoff
No
1.12
Parameter controlling gene filtering in cell2location; the larger the value, the fewer genes are selected. It is recommended to keep the number of genes between 8k-16k
--max_epochs
No
250
Number of epochs for model training
--seed
No
42
Random seed
--gpu_id
No
-1
ID of the GPU to use. If -1, use CPU. This parameter only specifies the main GPU to use; other GPUs may also be occupied but with very low usage. If you need to strictly specify the GPU, set the environment variable before running, e.g.: export CUDA_VISIBLE_DEVICES=2, then set --gpu_id 0 to use only GPU 2.
--n_threads
No
Number of threads to use in CPU mode, defaults to all CPUs
Output Results
<reference_name>_filter_genes.png/pdf
Gene filtering result plot by cell2location (<reference_name> is the single-cell ref h5ad file name)
<reference_name>_train_history.png/pdf
Training loss curve
<reference_name>_inf_aver.csv
Single-cell ref csv constructed by cell2location
Gene Filtering Result Plot by Cell2location:
<reference_name>_filter_genes.png/pdf
The orange rectangle highlights genes excluded based on the combination of number of cells expressing that gene (Y-axis) and average RNA count for cells where the gene was detected (X-axis). It is recommended to keep this between 8k-16k.

Training Loss Curve:
<reference_name>_train_history.png/pdf
The ELBO loss curve during training; the first 20 epochs are removed from the plot.

Single-Cell Reference CSV Constructed by Cell2location:
<reference_name>_inf_aver.csv
Each row is a gene, each column is a cell type, and the value is the cell type feature calculated by cell2location (the estimated expression of each gene in each cell type using a negative binomial regression model).
7SK
0.3071783
0.22791654
0.059129756
...
A1BG
0.18173707
0.096046284
0.0936929
...
A1BG-AS1
0.04608244
0.042425267
0.08740552
...
A1CF
0.00167472
0.000960604
0.002093679
...
...
...
...
...
...
Last updated