Survival Analysis Module

Purpose

This module is based on IOBR, survival, survminer and other R packages to perform univariate survival analysis on immune infiltration/gene set scoring results and survival information, and output standardized survival curve plots.

Input File Examples

input Feature scoring/immune infiltration result file: Each row represents a sample name, each column represents various immune cells/gene set scoring and other features, tab-separated

SampleID

Macrophages_M2_CIBERSORT

CD8_T_Cells_EPIC

Sample1

0.123

0.456

Sample2

0.234

0.567

Sample3

0.345

0.678

clinical Survival information file: Each row represents a sample name, each column represents survival time, survival status and other features, tab-separated

SampleID

OS.time

DFS.time

DFS

Sample1

1000

800

Sample2

800

600

Sample3

1200

1000

Running Method

SDAS bulkValidate survivalKM --input tme_combine.txt --clinical survival.txt --signature Macrophages_M2_CIBERSORT --project_name survival --time OS.time --status OS.status --time_type day --output survival_output

Input Parameter Description

Parameter

Required

Default

Description

--input

Yes

Immune infiltration/scoring result file path

--clinical

Yes

Survival information file path

--signature

Yes

Feature column name used for survival analysis

--output

Yes

Output directory path

--project_name

test

Project name (used for output file naming etc.)

--time

OS.time

Survival time column name

--status

OS.status

Survival status column name (0=survival/no recurrence, 1=death/recurrence)

--time_type

day

Time unit, default day

Output Results Display

Result File

Description

survival.png/pdf

Survival curve plot

Survival curve plot: survival.png/pdf Shows survival analysis curves under specified feature grouping, displaying survival differences between high and low groups.

Differences in grouping methods:

Best cutoff grouping: Find the cutoff value that maximizes survival differences between two groups through statistical methods. Usually can most significantly distinguish high-risk and low-risk groups (as shown in the left figure).
Mean grouping: Use the mean score of all samples as the boundary point. This method works well when data distribution is symmetric, but may not be sensitive enough in skewed distributions (as shown in the middle figure).
Tertile grouping: Divide samples into three equal parts (low, medium, high) according to scores. The difference between high and low groups may not be as significant as the previous two methods (as shown in the right figure), but can avoid the influence of extreme values and observe the impact of the middle group.

Meaning of statistical indicators:

P-value: Indicates the significance of survival differences between groups. The smaller the P-value, the less likely the group differences are due to random factors.
Hazard Ratio (HR): Indicates the death risk multiple of the high group relative to the low group. HR>1 indicates high risk
95% CI (Confidence Interval): Indicates the 95% confidence interval of HR, reflecting the precision of the estimate. If the interval does not contain 1 (e.g., 1.41-2.64 in the left figure), it means HR is significantly not equal to 1. The narrower the interval, the more precise the estimate.
Cutoff value: The boundary point used in the best cutoff grouping method.
Survival curves: Show the survival probability changes over time for the high group (orange) and low group (blue). The more obvious the curve separation, the greater the group differences.
Risk table (Number at risk): Shows the number of people at risk in each group (i.e., the number of people who have not experienced the endpoint event up to that time point) below the time points, which helps assess the changes in sample size of each group over time.

PreviousGene Set Scoring Module NextPipeline Example

Last updated 2 months ago