Command Description
Purpose and Usage
mergeAdata: Merge Multiple h5ad Files
Merge multiple h5ad files into a single h5ad file
SDAS dataProcess mergeAdata -i mult.csv -o outdir
h5ad2rds: Convert h5ad to rds
Convert h5ad format data to rds format
SDAS dataProcess h5ad2rds -i st.h5ad --run_mode stRNA -o outdir
h5mu2h5ad: Convert h5mu to h5ad
Convert h5mu format data to h5ad format
SDAS dataProcess h5mu2h5ad -i st.h5mu -o outdir
printAdataInfo: Print adata Information
Output detailed information of the h5ad file to the shell or a specified directory
SDAS dataProcess printAdataInfo -i st.h5ad -o outdir
SDAS dataProcess printAdataInfo -i st.h5ad
subsetAdata: Extract h5ad Subset
Extract a subset of h5ad based on specified conditions, supporting numeric range or list filtering
Numeric filtering:
SDAS dataProcess subsetAdata -i st.h5ad --label_key total_counts -o outdir \
--min 100 --max 5000
List filtering:
SDAS dataProcess subsetAdata -i st.h5ad --label_key anno_spotlight -o outdir \
--list_include B,Fibroblast
Input Parameter Description
-i / --input
Yes
Input file, supports h5ad, h5mu, csv (for mergeAdata, input is csv, header in first row)
--label_key
Yes
Used in subsetAdata, column name in obs or var to extract adata subset
-o / --output
No
Output folder, if -o is not set for printAdataInfo, output adata information to shell
--run_mode
No
Used in h5ad2rds, input data type, stRNA or scRNA, default is stRNA
--gene_symbol_key
No
Used in mergeAdata, column name of gene in h5ad.var (_index means h5ad.var.index)
--layer
No
Used in h5ad2rds and subsetAdata, specifies the layer storing raw counts in h5ad
--list_include
No
Used in subsetAdata, elements to extract when label_key is a list, e.g., Fibroblast,B,NK
--list_exclude
No
Used in subsetAdata, elements not to extract when label_key is a list, e.g., Fibroblast,B,NK
--min
No
Used in subsetAdata, minimum value for filtering when label_key is numeric
--max
No
Used in subsetAdata, maximum value for filtering when label_key is numeric
Output Results Display
<input_name>.h5ad
h5ad converted from h5mu
<input_name>_subset.h5ad
Subset h5ad obtained by subsetAdata
combine.h5ad
h5ad after merging multiple files
<input_name>.rds
rds file converted from h5ad
<input_name>_adata_info.txt
Detailed information of adata
Detailed adata information
<input_name>_adata_info.txt
: This file is used to quickly understand the structure of the AnnData object, the main information and the distribution of labels. The file mainly outputs the following types of information:Basic dimensions of the AnnData object (number of observations n_obs × number of features n_vars).
Field names contained in obs (observations/samples) and var (features/genes).
Types of analysis results or metadata stored in uns, obsm, layers, obsp, etc.
Column count statistics for obs and var, and the first five values of obs_names and var_names.
The number of unique values and specific values for each categorical field in obs (such as leiden cluster labels, sample information, etc.).
AnnData object with n_obs × n_vars = 120 × 32577
obs: 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden', 'orig.ident', 'x', 'y'
var: 'real_gene_name', 'n_cells', 'n_counts', 'mean_counts', 'mean', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'bin_size', 'bin_type', 'gene_leiden', 'mt', 'leiden_resolution', 'neighbors', 'omics', 'pca_variance_ratio', 'rank_genes_groups', 'resolution'
obsm: 'spatial'
layers: 'raw_counts'
obsp: 'connectivities', 'distances'
The 'obs' attribute of the AnnData contains 7 columns.
The 'var' attribute of the AnnData contains 6 columns.
Top 5 cell names: Index(['56032143344836', '56027848377591', '56006373541090', '55941949031633',
'55937654064316'],
dtype='object')
Top 5 gene names: Index(['ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419',
'ENSG00000000457', 'ENSG00000000460'],
dtype='object')
Top 5 real_gene_name: ['TSPAN6', 'TNMD', 'DPM1', 'SCYL3', 'C1orf112']
Number of unique values in each column of 'obs' (except 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'x', 'y')
leiden: 8 unique values
orig.ident: 1 unique values
Unique values in each column of 'obs':
**************************************************
leiden: Index(['10', '11', '12', '13', '14', '15', '16', '17'], dtype='object')
**************************************************
orig.ident: Index(['sample1'], dtype='object')
**************************************************
Last updated