S2 Labs
  • Introduction
  • Getting Started
    • Quickstart
  • Command Line Options
    • isciml
    • isciml: generate-mesh tdem-loop
    • isciml: generate-models
      • inspect-mesh
      • initialize-mu-sigma
      • one-pipe
    • isciml: generate
    • isciml: train
    • isciml: inference
  • isciml tdem
  • Distributed computing
    • Overview
    • Data generation using Slurm
    • Multi-GPU Training
  • Contact Us
Powered by GitBook
On this page
  • Creating a Slurm Job Script
  • Submitting the Job
  • Example: Scaling to Multiple Nodes
  • Important Note on HPC Environment Configuration
  1. Distributed computing

Data generation using Slurm

For users with access to high-performance computing clusters that use Slurm as a workload manager, isciml can be run in a distributed manner to process large datasets more efficiently.

Creating a Slurm Job Script

Create a file named isciml_job.sh with the following content:

#!/bin/bash
#SBATCH -N 1
#SBATCH -p <PARTITION>
#SBATCH -q <QUEUE>
#SBATCH -t 5:00:00
#SBATCH --ntasks-per-node=2

singularity exec isciml.sif mpirun -np 2 isciml generate [OPTIONS]

This script does the following:

  • Requests 1 node (-N 1)

  • Uses a specific partition (-p <PARTITION>)

  • Specifies a queue (-q <QUEUE>)

  • Sets a time limit of 5 hours (-t 5:00:00)

  • Allocates 2 tasks per node (--ntasks-per-node=2)

  • Runs isciml using Singularity and MPI with 2 processes

Note: Replace <PARTITION> and <QUEUE> with the appropriate values for your HPC environment. These options may vary depending on your cluster's configuration. Consult your system administrator or HPC documentation for the correct values.

Ensure that the number of tasks (--ntasks-per-node) matches the number of MPI processes (-np 2). Adjust these parameters according to your specific needs and the capabilities of your cluster.

Submitting the Job

To submit the job to the SLURM queue, use the following command:

sbatch isciml_job.sh

[Monitoring the Job and Tips sections remain unchanged]

Example: Scaling to Multiple Nodes

To use multiple nodes, modify your SLURM script as follows:

#!/bin/bash
#SBATCH -N 4
#SBATCH -p <PARTITION>
#SBATCH -q <QUEUE>
#SBATCH -t 10:00:00
#SBATCH --ntasks-per-node=16

singularity exec isciml.sif mpirun -np 64 isciml generate [OPTIONS]

This script requests 4 nodes with 16 tasks per node, for a total of 64 MPI processes. Note that the total number of tasks (4 * 16 = 64) matches the number of MPI processes (-np 64).

Remember to replace <PARTITION> and <QUEUE> with the appropriate values for your HPC environment, and adjust the generate command options to properly distribute the workload across all processes.

Important Note on HPC Environment Configuration

The exact configuration options, including available partitions, queues, and resource limits, can vary significantly between different HPC environments. Always consult your system's documentation or HPC support team to determine the correct values for:

  • Partitions (-p option)

  • Queues (-q option)

  • Time limits (-t option)

  • Available resources (number of nodes, tasks per node, etc.)

Tailoring these options to your specific HPC environment will ensure optimal performance and adherence to system policies.

For more information on optimizing isciml for distributed environments, consult the full documentation or reach out to the isciml support channels.

[Remaining sections (Tips, Troubleshooting, etc.) stay the same]

PreviousOverviewNextMulti-GPU Training

Last updated 9 months ago