isciml: train
Overview
The train
subcommand in isciml is used for training machine learning models on the data generated from the generate
command. This command allows users to configure various aspects of the training process, including model architecture, training parameters, and data handling.
Usage
Options
--sample_folder PATH
Folder containing sample files
-
Yes
--target_folder PATH
Folder containing target files
-
Yes
--n_blocks INTEGER
Number of blocks in UNet
4
No
--start_filters INTEGER
Number of start filters
32
No
--batch_size INTEGER
Batch size for training
1
No
--max_epochs INTEGER
Maximum number of epochs
1
No
--learning_rate FLOAT
Adam optimizer learning rate
0.001
No
--save_model PATH
File name to save the checkpoint at the end of training
pytorch_model.ckpt
No
--load_model PATH
Checkpoint file name to load at the beginning of training
-
No
--train_size FLOAT
Training size (proportion of data used for training)
0.8
No
--num_workers INTEGER
Number of workers for data loader
1
No
--n_gpus INTEGER
Number of GPUs used for training
1
No
--strategy TEXT
Distributed Data Parallel Strategy
auto
No
--checkpoint_folder PATH
Checkpoint folder
./lightning_checkpoint_folder
No
--every_n_epochs INTEGER
Number of epochs between checkpoints
-
No
--save_top_k INTEGER
Number of best models to save
1
No
`--reshape_base [two
eight]`
Reshape 1D to 2D using base 2 or 8
eight
--dim INTEGER
Dimension of the grid of solution
2
No
--help
Show the help message and exit
-
No
Description
The train
subcommand allows you to train a UNet model on your generated data. It provides various options to customize the training process, including model architecture, training parameters, and data handling.
Example Usage
To train a model with default settings:
To train a more complex model with custom settings:
Notes
The sample and target folders should contain the input and output data generated using the
generate
command.The UNet architecture is defined by
n_blocks
andstart_filters
. Increasing these values will create a more complex model.batch_size
,max_epochs
, andlearning_rate
are key training parameters that affect the learning process.The
save_model
option specifies where the final trained model will be saved.Use
load_model
to continue training from a previously saved checkpoint.train_size
determines the split between training and validation data.n_gpus
andstrategy
are used for distributed training across multiple GPUs.The checkpointing system saves models periodically and can keep the top K models based on validation performance.
reshape_base
anddim
options are used to reshape the input data, which may be necessary depending on your data format.
Related Commands
generate-models
: Used to create the physical models.generate
: Used to create the training data.inference
: Used to apply the trained model to new data.
See Also
For more information on model architectures, training strategies, and hyperparameter tuning, refer to the isciml documentation on machine learning models and training processes.
Last updated