Command Description

Purpose and Usage

mergeAdata: Merge Multiple h5ad Files

Merge multiple h5ad files into a single h5ad file

SDAS dataProcess mergeAdata -i mult.csv -o outdir

h5ad2rds: Convert h5ad to rds

Convert h5ad format data to rds format

SDAS dataProcess h5ad2rds -i st.h5ad --run_mode stRNA -o outdir

h5mu2h5ad: Convert h5mu to h5ad

Convert h5mu format data to h5ad format

SDAS dataProcess h5mu2h5ad -i st.h5mu -o outdir

printAdataInfo: Print adata Information

Output detailed information of the h5ad file to the shell or a specified directory

SDAS dataProcess printAdataInfo -i st.h5ad -o outdir
SDAS dataProcess printAdataInfo -i st.h5ad

subsetAdata: Extract h5ad Subset

Extract a subset of h5ad based on specified conditions, supporting numeric range or list filtering

  • Numeric filtering:

SDAS dataProcess subsetAdata -i st.h5ad --label_key total_counts -o outdir \
--min 100 --max 5000
  • List filtering:

SDAS dataProcess subsetAdata -i st.h5ad --label_key anno_spotlight -o outdir \
--list_include B,Fibroblast

Input Parameter Description

Parameter
Required
Description

-i / --input

Yes

Input file, supports h5ad, h5mu, csv (for mergeAdata, input is csv, header in first row)

--label_key

Yes

Used in subsetAdata, column name in obs or var to extract adata subset

-o / --output

No

Output folder, if -o is not set for printAdataInfo, output adata information to shell

--run_mode

No

Used in h5ad2rds, input data type, stRNA or scRNA, default is stRNA

--gene_symbol_key

No

Used in mergeAdata, column name of gene in h5ad.var (_index means h5ad.var.index)

--layer

No

Used in h5ad2rds and subsetAdata, specifies the layer storing raw counts in h5ad

--list_include

No

Used in subsetAdata, elements to extract when label_key is a list, e.g., Fibroblast,B,NK

--list_exclude

No

Used in subsetAdata, elements not to extract when label_key is a list, e.g., Fibroblast,B,NK

--min

No

Used in subsetAdata, minimum value for filtering when label_key is numeric

--max

No

Used in subsetAdata, maximum value for filtering when label_key is numeric

Output Results Display

Result File
Description

<input_name>.h5ad

h5ad converted from h5mu

<input_name>_subset.h5ad

Subset h5ad obtained by subsetAdata

combine.h5ad

h5ad after merging multiple files

<input_name>.rds

rds file converted from h5ad

<input_name>_adata_info.txt

Detailed information of adata

  • Detailed adata information <input_name>_adata_info.txt: This file is used to quickly understand the structure of the AnnData object, the main information and the distribution of labels. The file mainly outputs the following types of information:

    • Basic dimensions of the AnnData object (number of observations n_obs × number of features n_vars).

    • Field names contained in obs (observations/samples) and var (features/genes).

    • Types of analysis results or metadata stored in uns, obsm, layers, obsp, etc.

    • Column count statistics for obs and var, and the first five values of obs_names and var_names.

    • The number of unique values and specific values for each categorical field in obs (such as leiden cluster labels, sample information, etc.).

AnnData object with n_obs × n_vars = 120 × 32577
    obs: 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden', 'orig.ident', 'x', 'y'
    var: 'real_gene_name', 'n_cells', 'n_counts', 'mean_counts', 'mean', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'bin_size', 'bin_type', 'gene_leiden', 'mt', 'leiden_resolution', 'neighbors', 'omics', 'pca_variance_ratio', 'rank_genes_groups', 'resolution'
    obsm: 'spatial'
    layers: 'raw_counts'
    obsp: 'connectivities', 'distances'

The 'obs' attribute of the AnnData contains 7 columns.
The 'var' attribute of the AnnData contains 6 columns.

Top 5 cell names: Index(['56032143344836', '56027848377591', '56006373541090', '55941949031633',
       '55937654064316'],
      dtype='object')
Top 5 gene names: Index(['ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419',
       'ENSG00000000457', 'ENSG00000000460'],
      dtype='object')
Top 5 real_gene_name: ['TSPAN6', 'TNMD', 'DPM1', 'SCYL3', 'C1orf112']

Number of unique values in each column of 'obs' (except 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'x', 'y')
leiden: 8 unique values
orig.ident: 1 unique values

Unique values in each column of 'obs':
**************************************************
leiden: Index(['10', '11', '12', '13', '14', '15', '16', '17'], dtype='object')
**************************************************
orig.ident: Index(['sample1'], dtype='object')
**************************************************

Last updated