Survival Analysis Module

Purpose

This module is based on IOBR, survival, survminer and other R packages to perform univariate survival analysis on immune infiltration/gene set scoring results and survival information, and output standardized survival curve plots.

Input File Examples

  • input Feature scoring/immune infiltration result file: Each row represents a sample name, each column represents various immune cells/gene set scoring and other features, tab-separated

SampleID
Macrophages_M2_CIBERSORT
CD8_T_Cells_EPIC

Sample1

0.123

0.456

Sample2

0.234

0.567

Sample3

0.345

0.678

  • clinical Survival information file: Each row represents a sample name, each column represents survival time, survival status and other features, tab-separated

SampleID
OS.time
OS
DFS.time
DFS

Sample1

1000

1

800

0

Sample2

800

0

600

0

Sample3

1200

1

1000

1

Running Method

SDAS bulkValidate survivalKM --input tme_combine.txt --clinical survival.txt --signature Macrophages_M2_CIBERSORT --project_name survival --time OS.time --status OS.status --time_type day --output survival_output

Input Parameter Description

Parameter
Required
Default
Description

--input

Yes

Immune infiltration/scoring result file path

--clinical

Yes

Survival information file path

--signature

Yes

Feature column name used for survival analysis

--output

Yes

Output directory path

--project_name

No

test

Project name (used for output file naming etc.)

--time

No

OS.time

Survival time column name

--status

No

OS.status

Survival status column name (0=survival/no recurrence, 1=death/recurrence)

--time_type

No

day

Time unit, default day

Output Results Display

Result File
Description

survival.png/pdf

Survival curve plot

  • Survival curve plot: survival.png/pdf Shows survival analysis curves under specified feature grouping, displaying survival differences between high and low groups.

Differences in grouping methods:

  • Best cutoff grouping: Find the cutoff value that maximizes survival differences between two groups through statistical methods. Usually can most significantly distinguish high-risk and low-risk groups (as shown in the left figure).

  • Mean grouping: Use the mean score of all samples as the boundary point. This method works well when data distribution is symmetric, but may not be sensitive enough in skewed distributions (as shown in the middle figure).

  • Tertile grouping: Divide samples into three equal parts (low, medium, high) according to scores. The difference between high and low groups may not be as significant as the previous two methods (as shown in the right figure), but can avoid the influence of extreme values and observe the impact of the middle group.

Meaning of statistical indicators:

  • P-value: Indicates the significance of survival differences between groups. The smaller the P-value, the less likely the group differences are due to random factors.

  • Hazard Ratio (HR): Indicates the death risk multiple of the high group relative to the low group. HR>1 indicates high risk

  • 95% CI (Confidence Interval): Indicates the 95% confidence interval of HR, reflecting the precision of the estimate. If the interval does not contain 1 (e.g., 1.41-2.64 in the left figure), it means HR is significantly not equal to 1. The narrower the interval, the more precise the estimate.

  • Cutoff value: The boundary point used in the best cutoff grouping method.

  • Survival curves: Show the survival probability changes over time for the high group (orange) and low group (blue). The more obvious the curve separation, the greater the group differences.

  • Risk table (Number at risk): Shows the number of people at risk in each group (i.e., the number of people who have not experienced the endpoint event up to that time point) below the time points, which helps assess the changes in sample size of each group over time.

Last updated