Cluster Submission Mode​

Introduction

SDAS Pipelines Automated Job Submission is an intelligent job scheduling system for automatically managing and submitting SDAS analysis workflow jobs to PBS/Torque clusters. The system provides:

  • Automatic Dependency Resolution: Intelligent scheduling based on job dependencies

  • Concurrency Control: Limits the number of simultaneously running jobs to avoid resource conflicts

  • Status Monitoring: Real-time monitoring of job execution status

  • Error Handling: Automatic retry of failed jobs

  • Detailed Reporting: Generates complete execution reports and logs

System Requirements

  • Python 3.6+

  • Job scheduling system (supports any of the following):

    • PBS/Torque

    • SGE (Sun Grid Engine)

    • Slurm

    • LSF (IBM Platform Load Sharing Facility)

  • Appropriate queue permissions

  • SDAS software properly configured

Usage Steps

1. Configure pipeline_input.conf File

Before running SDAS Pipeline, you need to configure the pipeline_input.conf file. This file defines:

  • Input Data: h5ad file paths and grouping information

  • Analysis Workflow: Select SDAS modules to run

  • Module Parameters: Specific parameter configurations for each module

  • Dependencies: Input/output relationships between modules

1.1 Basic Configuration Structure

1.2 Module Parameter Configuration Examples

Configuration Notes:

  • Parameter format: parameter_name = parameter_value

  • Space indicates: parameter value is empty, use default value

  • Comments: Start with # for parameter explanations

  • Path parameters: Use absolute paths to avoid relative path issues

Spatial Gene Co-expression Analysis (coexpress)

Cell Type Annotation (cellAnnotation)

Spatial Domain Identification (spatialDomain)

1.3 Module Dependency Configuration

SDAS modules have dependencies specified through *_input_process parameters:

2. Generate Job Configuration

After configuration, run SDAS Pipeline to generate job configuration files:

This will generate the all_shell.conf file containing all jobs and their dependencies.

Before actual submission, it's recommended to preview using dry-run mode:

This will display:

  • All job dependencies

  • Resource requirements (CPU, memory)

  • Qsub scripts to be generated

4. Submit Jobs

After confirmation, submit jobs to the queue:

auto_qsub_scheduler.py Job Scheduling System Configuration

Based on your cluster environment, you need to modify the create_qsub_script method in auto_qsub_scheduler.py to customize the job submission script format. This method is located in the QsubScheduler class:

You need to:

  1. Modify the script template based on your job scheduling system (PBS/Torque, SGE, Slurm, or LSF)

  2. Ensure necessary resource configuration parameters are included (CPU, memory, etc.)

  3. Maintain references to the following variables:

    • self.queue: Queue name

    • shell_file: Execution script path

    • cpu: Number of CPU cores

    • memory: Memory requirement

Test Data and Configuration Files

SDAS Pipelines provides single-slice and multiple-slice test data with corresponding configuration files for users to quickly get started and test the system.

Directory Structure

Single-Slice Data Analysis Configuration

pipeline_input.single_slice.conf is designed for single spatial transcriptome slice analysis workflow:

  • Input Data: Single h5ad file

  • Analysis Modules: Includes most SDAS analysis modules

  • Features:

    • Simple data input configuration

    • Complete module parameter examples

    • Suitable for first-time users

Multiple-Slice Data Analysis Configuration

pipeline_input.multiple_slice.conf is designed for multiple spatial transcriptome slice analysis workflow:

  • Input Data: Multiple h5ad files with grouping information (e.g., Normal/Tumor)

  • Analysis Modules: Select appropriate modules based on experimental design

  • Features:

    • Demonstrates multi-sample input format

    • Includes inter-group comparison parameter settings

    • Suitable for comparative analysis

Testing Steps

1. Single-Slice Data Testing

Step 1: Prepare Configuration File

Step 2: Generate Job Configuration

Step 3: Preview Jobs (Recommended)

Step 4: Submit Jobs

2. Multiple-Slice Data Testing

Step 1: Prepare Configuration File

Step 2: Generate Job Configuration

Step 3: Preview Jobs (Recommended)

Step 4: Submit Jobs

3. auto_qsub_scheduler.py Parameter Description

Basic Parameters:

  • -c, --config: Job configuration file path (required)

  • -o, --output: Output directory path (required)

  • --queue: Queue name (default: stereo.q)

  • --max-concurrent: Maximum concurrent jobs (default: 10)

  • --retry-times: Number of retries for failed jobs (default: 3)

  • --wait-time: Status check interval in seconds (default: 30)

  • --dry-run: Preview mode, no actual job submission

Usage Examples:

4. Monitoring and Logs

Real-time Monitoring:

  • The program displays job status updates during execution

  • Press Ctrl+C to safely stop the scheduler

Log Files:

  • Scheduler log: ./output/scheduler.log

  • Job logs: ./output/qsub_info/logs/

  • Job scripts: ./output/qsub_info/shell/

Status Checking:

5. Troubleshooting

Common Issues:

  1. Job Submission Failure

    • Check if queue name is correct

    • Confirm sufficient queue permissions

    • Check if resource requirements are reasonable

  2. Dependency Relationship Errors

    • Check all_shell.conf file format

    • Confirm dependent job names are correct

  3. Jobs Getting Stuck

    • Check if cluster resources are sufficient

    • View error messages in job logs

    • Consider adjusting --wait-time parameter

  4. Permission Issues

    • Ensure write permissions for output directory

    • Check queue submission permissions

Debug Mode:

Last updated