Single-cell/pseudobulk Differential Analysis
Purpose and Usage
1. Single-cell Level Differential Analysis (No Biological Replicates)
Supports t-test
, wilcoxon
, MAST
Scenario 1: Differential analysis between specified category 1 and category 2
SDAS DEG -i st.h5ad -o outdir --group_key leiden --de_method wilcoxon \ --ident1 1 --ident2 2 \ --fdr 0.05 --log2fc 1
Scenario 2: Differential analysis for each category versus all others
SDAS DEG -i st.h5ad -o outdir --group_key leiden --de_method wilcoxon \ --fdr 0.05 --log2fc 1
Scenario 3: Subset a column in obs before differential analysis
SDAS DEG -i st.h5ad -o outdir --group_key leiden --de_method wilcoxon \ --ident1 1 --ident2 2 \ --fdr 0.05 --log2fc 1 \ --subset_key cell_type --subset_values B
2. Pseudobulk Differential Analysis (With Biological Replicates)
Recommended: DESeq2
, edgeR
(pseudobulk analysis). You must specify --sample_key
, and the number of samples per group must meet the method requirements (DESeq2 ≥ 3, edgeR ≥ 2).
Scenario 1: Direct differential analysis between two groups of samples
SDAS DEG -i st.h5ad -o outdir --group_key sampleID --de_method DESeq2 \ --ident1 Tumor --ident2 Normal \ --fdr 0.05 --log2fc 1 \ --sample_key sampleID
Scenario 2: Subset before pseudobulk differential analysis
SDAS DEG -i st.h5ad -o outdir --group_key sampleID --de_method DESeq2 \ --ident1 Tumor --ident2 Normal \ --fdr 0.05 --log2fc 1 \ --sample_key sampleID \ --subset_key cell_type --subset_values B
Input Parameter Description
-i / --input
Yes
Input a h5ad file which contain gene expression matrix.
-o / --output
Yes
output directory.
--de_method
Yes
Chose a DEG method.
--group_key
Yes
Identifier name in h5ad obs, must contain ident1 and ident2.
--ident1
No
Identity class to define DEG for, if NULL, each object in --group_key will be used.
--ident2
No
A second identity class for comparison, if NULL, use the union of the rest in --group_key.
--sample_key
No
Sample key in obs (optional), must set when de_method is DESeq2 or edgeR
--subset_key
No
Key for subsetting (optional), each value will be subset for DEG if not set --subset_values
--subset_values
No
Values in --subset_key used for subsetting (optional), eg. cell1,cell2
--layer
No
Set gene raw expression layer, if NULL, adata.raw.X or adata.X will be used
--gene_symbol_key
No
real_gene_name
set gene name. default: real_gene_name for saw h5ad
--fdr
No
0.05
set adjusted p-value (FDR) cutoff to chose significant deg genes. default: 0.05
--log2fc
No
1
set absolute logfoldchanges value cutoff to chose significant deg genes. default: 1
--genelist
No
5
draw genes in volcano_plot, split genes by ',', default 5 significant genes in up and down, set 0 to not draw gene in volcano_plot
--add_label
No
Input a csv format file to add a label to obs columns
--min_gene
No
0
min genes per spot for filter, default: 1
--max_gene
No
max genes per spot for filter, default not filter
--min_cell
No
0
a gene in min cells for filter, default: 1
--volcano_xlim
No
set x limit in volcano plot, eg: -5 5.
Output Results Display
de_{method}.{group_key}.{ident1}-vs-{ident2}.raw.csv
Raw output from the software
de_{method}.{group_key}.{ident1}-vs-{ident2}.all.csv
Extracted results with geneName, log2FC, Pvalue, FDR, etc.
de_{method}.{group_key}.{ident1}-vs-{ident2}.sig_filtered.csv
Significant DEGs filtered by log2FC and Pvalue
de_{method}.{group_key}.{ident1}-vs-{ident2}.png/pdf
Volcano plot in png or pdf format
Raw file format example:
de_{method}.{group_key}.{ident1}-vs-{ident2}.raw.csv
This file is the original output from the differential analysis software, which may contain information such as gene name, fold change, Pvalue, adjusted Pvalue (FDR), and other details.
MTATP6P1
16.74336
1.3794351
1.3877341418899603e-42
2.2785333033340524e-39
AGR2
13.671169
1.7758344
1.419568544127444e-32
1.1147316293689464e-29
CLDN4
13.663365
1.9820584
1.9626883546881656e-34
1.6880054463820458e-31
...
...
...
...
...
all/sig_filtered file format example:
de_{method}.{group_key}.{ident1}-vs-{ident2}.all.csv
This file extracts gene name, fold change, Pvalue, and adjusted Pvalue (FDR) from the original results and renames them uniformly.de_{method}.{group_key}.{ident1}-vs-{ident2}.sig_filtered.csv
is the list of significant DEGs filtered by log2FC and FDR thresholds.
MTATP6P1
1.3794351
1.3877341418899603e-42
2.2785333033340524e-39
AGR2
1.7758344
1.419568544127444e-32
1.1147316293689464e-29
CLDN4
1.9820584
1.9626883546881656e-34
1.6880054463820458e-31
...
...
...
...
Volcano plot result example:
de_{method}.{group_key}.{ident1}-vs-{ident2}.png/pdf
In the plot, red dots represent significant DEGs that meet both log2FC and FDR thresholds, blue dots meet the FDR but not log2FC threshold, and green dots meet the log2FC but not FDR threshold. By default, the top 5 up- and down-regulated genes are labeled. You can specify genes to label in the plot using the genelist parameter (e.g., --genelist geneA,geneB,geneC).

Result Interpretation
Gene name uniqueness
Before differential analysis, gene names are automatically made unique using
make_unique
. All outputs and plots use the unique gene names.
Cell and gene filtering
Supports filtering cells and genes using parameters such as
--min_gene
,--max_gene
, and--min_cell
. If the h5ad file has already been filtered, these can be omitted.
Parameter Tuning Suggestions
When the number of bins/cells exceeds 200k, MAST cannot run successfully. In this case, stricter filtering parameters (
min_gene
andmin_cell
) can be set to reduce the number of bins/cells before analysis.
Last updated