Introduction

The Public Bulk Validation Module is a crucial component of the SDAS software, designed to validate the biological significance and clinical relevance of spatial transcriptomics analysis results through public Bulk RNA-Seq datasets and clinical data. This module provides three core analytical methods: Bulk RNA-Seq immune infiltration analysis, gene set scoring analysis, and Kaplan-Meier survival analysis. Through a systematic validation framework, it supports the verification of biological clues discovered from spatial transcriptomics data in clinical samples, providing essential support for the clinical translation of spatial transcriptomics research.

Module Overview

1. Immune Infiltration Analysis

Through multiple algorithms (such as CIBERSORTx, EPIC, etc.), this method evaluates immune cell infiltration in the tumor microenvironment, outputs immune cell proportions and heatmaps, helping researchers gain deep insights into the mechanisms of immune cells in tumor development and progression.

2. Gene Set Scoring Analysis

Using various algorithms (such as GSVA, ssGSEA, etc.), this method scores the activity of custom gene sets in Bulk RNA-Seq data, outputs standardized enrichment scores and heatmaps, facilitating the assessment of gene set biological functions and their significance in disease progression.

3. Kaplan-Meier Survival Analysis

Combining immune infiltration and gene set scoring results, this method performs univariate survival analysis, outputs survival curve plots, helping researchers evaluate the association between specific features and clinical prognosis, providing data support for clinical decision-making.

Validation Strategy and Workflow

Step 0: Data Preparation

  • Data Sources: Obtain Bulk RNA-Seq datasets with clinical information from public databases

  • Recommended Databases: TCGA (UCSC Xena), GEO, etc.

  • Data Requirements: Must include complete clinical information and be preprocessed according to SDAS input format requirements

  • Data Format: Supports standardized gene expression matrices and clinical information tables

Validation Method 1: Gene Set Validation

Different validation strategies are adopted based on the number of genes in the gene set:

Single Gene Validation (1 gene)

  • Application Scenario: Specific marker gene-expressing cell types discovered in spatial transcriptomics data

  • Typical Examples: CD20+ B cells, SAA+ hepatocytes, etc.

  • Analysis Method: Kaplan-Meier survival analysis (i.e., survivalKM)

  • Data Source: Gene expression values (can be from Bulk RNA-Seq datasets or IHC experiments)

  • Validation Goal: Verify the association between individual gene expression levels and clinical prognosis

Gene Module Validation (3-8 genes or more)

  • Application Scenario: Gene signatures discovered in spatial transcriptomics data

  • Typical Examples: TLS (Tertiary Lymphoid Structure) subtypes, tumor subtypes, activated B cell subtypes, etc.

  • Analysis Method: Module scoring calculation combined with Kaplan-Meier survival analysis (i.e., geneSetScore + survivalKM)

  • Validation Goal: Verify the association between overall expression patterns of gene modules and clinical prognosis

Validation Method 2: Cell Type Validation

  • Application Scenario: Specific cell types discovered in spatial transcriptomics data

  • Validation Requirements: Target cell types must exist in reference sets of xCell, CIBERSORTx, or other immune infiltration algorithms

  • Analysis Workflow: Immune infiltration analysis of Bulk RNA-Seq data, followed by Kaplan-Meier survival analysis (i.e., immuneScore + survivalKM)

  • Validation Goal: Verify the association between specific cell type abundance and clinical prognosis

Reference

  • Zeng, D., Ye, Z., Shen, R., Yu, G., Wu, J., Xiong, Y., ... & Liao, W. (2021). IOBR: multi-omics immuno-oncology biological research to decode tumor microenvironment and signatures. Frontiers in immunology, 12, 687975.

Last updated