Introduction
The Public Bulk Validation Module is a crucial component of the SDAS software, designed to validate the biological significance and clinical relevance of spatial transcriptomics analysis results through public Bulk RNA-Seq datasets and clinical data. This module provides three core analytical methods: Bulk RNA-Seq immune infiltration analysis, gene set scoring analysis, and Kaplan-Meier survival analysis. Through a systematic validation framework, it supports the verification of biological clues discovered from spatial transcriptomics data in clinical samples, providing essential support for the clinical translation of spatial transcriptomics research.
Module Overview
1. Immune Infiltration Analysis
Through multiple algorithms (such as CIBERSORTx, EPIC, etc.), this method evaluates immune cell infiltration in the tumor microenvironment, outputs immune cell proportions and heatmaps, helping researchers gain deep insights into the mechanisms of immune cells in tumor development and progression.
2. Gene Set Scoring Analysis
Using various algorithms (such as GSVA, ssGSEA, etc.), this method scores the activity of custom gene sets in Bulk RNA-Seq data, outputs standardized enrichment scores and heatmaps, facilitating the assessment of gene set biological functions and their significance in disease progression.
3. Kaplan-Meier Survival Analysis
Combining immune infiltration and gene set scoring results, this method performs univariate survival analysis, outputs survival curve plots, helping researchers evaluate the association between specific features and clinical prognosis, providing data support for clinical decision-making.
Validation Strategy and Workflow
Step 0: Data Preparation
Data Sources: Obtain Bulk RNA-Seq datasets with clinical information from public databases
Recommended Databases: TCGA (UCSC Xena), GEO, etc.
Data Requirements: Must include complete clinical information and be preprocessed according to SDAS input format requirements
Data Format: Supports standardized gene expression matrices and clinical information tables
Validation Method 1: Gene Set Validation
Different validation strategies are adopted based on the number of genes in the gene set:
Single Gene Validation (1 gene)
Application Scenario: Specific marker gene-expressing cell types discovered in spatial transcriptomics data
Typical Examples: CD20+ B cells, SAA+ hepatocytes, etc.
Analysis Method: Kaplan-Meier survival analysis (i.e.,
survivalKM
)Data Source: Gene expression values (can be from Bulk RNA-Seq datasets or IHC experiments)
Validation Goal: Verify the association between individual gene expression levels and clinical prognosis
Gene Module Validation (3-8 genes or more)
Application Scenario: Gene signatures discovered in spatial transcriptomics data
Typical Examples: TLS (Tertiary Lymphoid Structure) subtypes, tumor subtypes, activated B cell subtypes, etc.
Analysis Method: Module scoring calculation combined with Kaplan-Meier survival analysis (i.e.,
geneSetScore
+survivalKM
)Validation Goal: Verify the association between overall expression patterns of gene modules and clinical prognosis
Validation Method 2: Cell Type Validation
Application Scenario: Specific cell types discovered in spatial transcriptomics data
Validation Requirements: Target cell types must exist in reference sets of xCell, CIBERSORTx, or other immune infiltration algorithms
Analysis Workflow: Immune infiltration analysis of Bulk RNA-Seq data, followed by Kaplan-Meier survival analysis (i.e.,
immuneScore
+survivalKM
)Validation Goal: Verify the association between specific cell type abundance and clinical prognosis
Reference
Zeng, D., Ye, Z., Shen, R., Yu, G., Wu, J., Xiong, Y., ... & Liao, W. (2021). IOBR: multi-omics immuno-oncology biological research to decode tumor microenvironment and signatures. Frontiers in immunology, 12, 687975.
Last updated