# isciml: train

## Overview

The `train` subcommand in isciml is used for training machine learning models on the data generated from the `generate` command. This command allows users to configure various aspects of the training process, including model architecture, training parameters, and data handling.

## Usage

```
singularity exec isciml.sif isciml train [OPTIONS]
```

## Options

| Option                     | Description                                               | Default                            | Required |
| -------------------------- | --------------------------------------------------------- | ---------------------------------- | -------- |
| `--sample_folder PATH`     | Folder containing sample files                            | -                                  | Yes      |
| `--target_folder PATH`     | Folder containing target files                            | -                                  | Yes      |
| `--n_blocks INTEGER`       | Number of blocks in UNet                                  | 4                                  | No       |
| `--start_filters INTEGER`  | Number of start filters                                   | 32                                 | No       |
| `--batch_size INTEGER`     | Batch size for training                                   | 1                                  | No       |
| `--max_epochs INTEGER`     | Maximum number of epochs                                  | 1                                  | No       |
| `--learning_rate FLOAT`    | Adam optimizer learning rate                              | 0.001                              | No       |
| `--save_model PATH`        | File name to save the checkpoint at the end of training   | pytorch\_model.ckpt                | No       |
| `--load_model PATH`        | Checkpoint file name to load at the beginning of training | -                                  | No       |
| `--train_size FLOAT`       | Training size (proportion of data used for training)      | 0.8                                | No       |
| `--num_workers INTEGER`    | Number of workers for data loader                         | 1                                  | No       |
| `--n_gpus INTEGER`         | Number of GPUs used for training                          | 1                                  | No       |
| `--strategy TEXT`          | Distributed Data Parallel Strategy                        | auto                               | No       |
| `--checkpoint_folder PATH` | Checkpoint folder                                         | ./lightning\_checkpoint\_folder    | No       |
| `--every_n_epochs INTEGER` | Number of epochs between checkpoints                      | -                                  | No       |
| `--save_top_k INTEGER`     | Number of best models to save                             | 1                                  | No       |
| \`--reshape\_base \[two    | eight]\`                                                  | Reshape 1D to 2D using base 2 or 8 | eight    |
| `--dim INTEGER`            | Dimension of the grid of solution                         | 2                                  | No       |
| `--help`                   | Show the help message and exit                            | -                                  | No       |

## Description

The `train` subcommand allows you to train a UNet model on your generated data. It provides various options to customize the training process, including model architecture, training parameters, and data handling.

## Example Usage

To train a model with default settings:

```
singularity exec isciml.sif isciml train \
    --sample_folder /path/to/samples \
    --target_folder /path/to/targets
```

To train a more complex model with custom settings:

```
singularity exec isciml.sif isciml train \
    --sample_folder /path/to/samples \
    --target_folder /path/to/targets \
    --n_blocks 5 \
    --start_filters 64 \
    --batch_size 32 \
    --max_epochs 100 \
    --learning_rate 0.0001 \
    --save_model custom_model.ckpt \
    --train_size 0.7 \
    --n_gpus 2 \
    --strategy ddp \
    --every_n_epochs 5 \
    --save_top_k 3
```

## Notes

1. The sample and target folders should contain the input and output data generated using the `generate` command.
2. The UNet architecture is defined by `n_blocks` and `start_filters`. Increasing these values will create a more complex model.
3. `batch_size`, `max_epochs`, and `learning_rate` are key training parameters that affect the learning process.
4. The `save_model` option specifies where the final trained model will be saved.
5. Use `load_model` to continue training from a previously saved checkpoint.
6. `train_size` determines the split between training and validation data.
7. `n_gpus` and `strategy` are used for distributed training across multiple GPUs.
8. The checkpointing system saves models periodically and can keep the top K models based on validation performance.
9. `reshape_base` and `dim` options are used to reshape the input data, which may be necessary depending on your data format.

## Related Commands

* `generate-models`: Used to create the physical models.
* `generate`: Used to create the training data.
* `inference`: Used to apply the trained model to new data.

## See Also

For more information on model architectures, training strategies, and hyperparameter tuning, refer to the isciml documentation on machine learning models and training processes.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://isciml.s2labs.co/command-line-options/openapi.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
