Cluster Submission Mode
Introduction
SDAS Pipelines Automated Job Submission is an intelligent job scheduling system for automatically managing and submitting SDAS analysis workflow jobs to PBS/Torque clusters. The system provides:
Automatic Dependency Resolution: Intelligent scheduling based on job dependencies
Concurrency Control: Limits the number of simultaneously running jobs to avoid resource conflicts
Status Monitoring: Real-time monitoring of job execution status
Error Handling: Automatic retry of failed jobs
Detailed Reporting: Generates complete execution reports and logs
System Requirements
Python 3.6+
Job scheduling system (supports any of the following):
PBS/Torque
SGE (Sun Grid Engine)
Slurm
LSF (IBM Platform Load Sharing Facility)
Appropriate queue permissions
SDAS software properly configured
Usage Steps
1. Configure pipeline_input.conf File
Before running SDAS Pipeline, you need to configure the pipeline_input.conf file. This file defines:
Input Data: h5ad file paths and grouping information
Analysis Workflow: Select SDAS modules to run
Module Parameters: Specific parameter configurations for each module
Dependencies: Input/output relationships between modules
1.1 Basic Configuration Structure
1.2 Module Parameter Configuration Examples
Configuration Notes:
Parameter format:
parameter_name = parameter_valueSpace indicates: parameter value is empty, use default value
Comments: Start with
#for parameter explanationsPath parameters: Use absolute paths to avoid relative path issues
Spatial Gene Co-expression Analysis (coexpress)
Cell Type Annotation (cellAnnotation)
Spatial Domain Identification (spatialDomain)
1.3 Module Dependency Configuration
SDAS modules have dependencies specified through *_input_process parameters:
2. Generate Job Configuration
After configuration, run SDAS Pipeline to generate job configuration files:
This will generate the all_shell.conf file containing all jobs and their dependencies.
3. Preview Jobs (Recommended)
Before actual submission, it's recommended to preview using dry-run mode:
This will display:
All job dependencies
Resource requirements (CPU, memory)
Qsub scripts to be generated
4. Submit Jobs
After confirmation, submit jobs to the queue:
auto_qsub_scheduler.py Job Scheduling System Configuration
Based on your cluster environment, you need to modify the create_qsub_script method in auto_qsub_scheduler.py to customize the job submission script format. This method is located in the QsubScheduler class:
You need to:
Modify the script template based on your job scheduling system (PBS/Torque, SGE, Slurm, or LSF)
Ensure necessary resource configuration parameters are included (CPU, memory, etc.)
Maintain references to the following variables:
self.queue: Queue nameshell_file: Execution script pathcpu: Number of CPU coresmemory: Memory requirement
Test Data and Configuration Files
SDAS Pipelines provides single-slice and multiple-slice test data with corresponding configuration files for users to quickly get started and test the system.
Directory Structure
Single-Slice Data Analysis Configuration
pipeline_input.single_slice.conf is designed for single spatial transcriptome slice analysis workflow:
Input Data: Single h5ad file
Analysis Modules: Includes most SDAS analysis modules
Features:
Simple data input configuration
Complete module parameter examples
Suitable for first-time users
Multiple-Slice Data Analysis Configuration
pipeline_input.multiple_slice.conf is designed for multiple spatial transcriptome slice analysis workflow:
Input Data: Multiple h5ad files with grouping information (e.g., Normal/Tumor)
Analysis Modules: Select appropriate modules based on experimental design
Features:
Demonstrates multi-sample input format
Includes inter-group comparison parameter settings
Suitable for comparative analysis
Testing Steps
1. Single-Slice Data Testing
Step 1: Prepare Configuration File
Step 2: Generate Job Configuration
Step 3: Preview Jobs (Recommended)
Step 4: Submit Jobs
2. Multiple-Slice Data Testing
Step 1: Prepare Configuration File
Step 2: Generate Job Configuration
Step 3: Preview Jobs (Recommended)
Step 4: Submit Jobs
3. auto_qsub_scheduler.py Parameter Description
Basic Parameters:
-c, --config: Job configuration file path (required)-o, --output: Output directory path (required)--queue: Queue name (default: stereo.q)--max-concurrent: Maximum concurrent jobs (default: 10)--retry-times: Number of retries for failed jobs (default: 3)--wait-time: Status check interval in seconds (default: 30)--dry-run: Preview mode, no actual job submission
Usage Examples:
4. Monitoring and Logs
Real-time Monitoring:
The program displays job status updates during execution
Press
Ctrl+Cto safely stop the scheduler
Log Files:
Scheduler log:
./output/scheduler.logJob logs:
./output/qsub_info/logs/Job scripts:
./output/qsub_info/shell/
Status Checking:
5. Troubleshooting
Common Issues:
Job Submission Failure
Check if queue name is correct
Confirm sufficient queue permissions
Check if resource requirements are reasonable
Dependency Relationship Errors
Check
all_shell.conffile formatConfirm dependent job names are correct
Jobs Getting Stuck
Check if cluster resources are sufficient
View error messages in job logs
Consider adjusting
--wait-timeparameter
Permission Issues
Ensure write permissions for output directory
Check queue submission permissions
Debug Mode:
Last updated