S2 Labs
  • Introduction
  • Getting Started
    • Quickstart
  • Command Line Options
    • isciml
    • isciml: generate-mesh tdem-loop
    • isciml: generate-models
      • inspect-mesh
      • initialize-mu-sigma
      • one-pipe
    • isciml: generate
    • isciml: train
    • isciml: inference
  • isciml tdem
  • Distributed computing
    • Overview
    • Data generation using Slurm
    • Multi-GPU Training
  • Contact Us
Powered by GitBook
On this page
  • Overview
  • Usage
  • Options
  • Description
  • Example Usage
  • Notes
  • Related Commands
  • See Also
  1. Command Line Options

isciml: train

Overview

The train subcommand in isciml is used for training machine learning models on the data generated from the generate command. This command allows users to configure various aspects of the training process, including model architecture, training parameters, and data handling.

Usage

singularity exec isciml.sif isciml train [OPTIONS]

Options

Option
Description
Default
Required

--sample_folder PATH

Folder containing sample files

-

Yes

--target_folder PATH

Folder containing target files

-

Yes

--n_blocks INTEGER

Number of blocks in UNet

4

No

--start_filters INTEGER

Number of start filters

32

No

--batch_size INTEGER

Batch size for training

1

No

--max_epochs INTEGER

Maximum number of epochs

1

No

--learning_rate FLOAT

Adam optimizer learning rate

0.001

No

--save_model PATH

File name to save the checkpoint at the end of training

pytorch_model.ckpt

No

--load_model PATH

Checkpoint file name to load at the beginning of training

-

No

--train_size FLOAT

Training size (proportion of data used for training)

0.8

No

--num_workers INTEGER

Number of workers for data loader

1

No

--n_gpus INTEGER

Number of GPUs used for training

1

No

--strategy TEXT

Distributed Data Parallel Strategy

auto

No

--checkpoint_folder PATH

Checkpoint folder

./lightning_checkpoint_folder

No

--every_n_epochs INTEGER

Number of epochs between checkpoints

-

No

--save_top_k INTEGER

Number of best models to save

1

No

`--reshape_base [two

eight]`

Reshape 1D to 2D using base 2 or 8

eight

--dim INTEGER

Dimension of the grid of solution

2

No

--help

Show the help message and exit

-

No

Description

The train subcommand allows you to train a UNet model on your generated data. It provides various options to customize the training process, including model architecture, training parameters, and data handling.

Example Usage

To train a model with default settings:

singularity exec isciml.sif isciml train \
    --sample_folder /path/to/samples \
    --target_folder /path/to/targets

To train a more complex model with custom settings:

singularity exec isciml.sif isciml train \
    --sample_folder /path/to/samples \
    --target_folder /path/to/targets \
    --n_blocks 5 \
    --start_filters 64 \
    --batch_size 32 \
    --max_epochs 100 \
    --learning_rate 0.0001 \
    --save_model custom_model.ckpt \
    --train_size 0.7 \
    --n_gpus 2 \
    --strategy ddp \
    --every_n_epochs 5 \
    --save_top_k 3

Notes

  1. The sample and target folders should contain the input and output data generated using the generate command.

  2. The UNet architecture is defined by n_blocks and start_filters. Increasing these values will create a more complex model.

  3. batch_size, max_epochs, and learning_rate are key training parameters that affect the learning process.

  4. The save_model option specifies where the final trained model will be saved.

  5. Use load_model to continue training from a previously saved checkpoint.

  6. train_size determines the split between training and validation data.

  7. n_gpus and strategy are used for distributed training across multiple GPUs.

  8. The checkpointing system saves models periodically and can keep the top K models based on validation performance.

  9. reshape_base and dim options are used to reshape the input data, which may be necessary depending on your data format.

Related Commands

  • generate-models: Used to create the physical models.

  • generate: Used to create the training data.

  • inference: Used to apply the trained model to new data.

See Also

For more information on model architectures, training strategies, and hyperparameter tuning, refer to the isciml documentation on machine learning models and training processes.

Previousisciml: generateNextisciml: inference

Last updated 9 months ago