# PG-MCTL T-Maze

This module contains implementations for solving the T-Maze task using the following algorithms:

- [PG](rpg_tmaze.py)
- [Lazy AlphaZero](lazylaphazero_tmaze.py)
- [MCTL (Lazy MCTS)](lazymcts_tmaze.py)
- [PG-MCTL](pgmcts_tmaze.py)

Each algorithm is described as a class within its respective module. Additionally, each module contains a `main` function that provides an example of how to use the algorithm independently (for RPG and PG-MCTS, the main functionality is in the `train` method).

All algorithms share the same [T-Maze task environment](env_tmaze.py), and modifying this module allows for changes in the behavior of the T-Maze itself.

## Task Input

- T-Maze corridor length
- Number of training episodes

Each algorithm can be run as a module using a command like the ones below. Please provide the aforementioned inputs as command line arguments when running the command.

```bash
# RPG, T-Maze corridor length: 20, Number of training episodes: 50000
python3.8 -m pgmcts_tmaze.rpg_tmaze 20 -i 50000
# PG-MCTL, T-Maze corridor length: 10, Number of training episodes: 30000
python3.8 -m pgmcts_tmaze.pgmcts_tmaze 10 -i 30000
```

## Hyperparameters and Algorithms

|                                                        | PG | PPO | Lazy AlphaZero | MCTL (Lazy MCTS) | PG-MCTL  |
|--------------------------------------------------------|---|-----|--------------|---------------|----------|
| Discount factor (γ)                                    | ✅ | ✅ | ✅           | ✅             |  ✅  |
| Learning rate (α) for NN parameter updates             | ✅ | ✅ | ✅           |               |  ✅        |
| PPO clipping value                                     |    | ✅ |              |               |          |
| PPO number of epochs                                   |    | ✅ |              |               |          |
| NN dropout probability                                 | ✅ | ✅ | ✅           |               |   ✅       |
| NN gradient clipping value                             | ✅ | ✅ | ✅           |               |  ✅        |
| UCT exploration-exploitation balancing coefficient (C) |    |  | ✅            | ✅             |  ✅        |
| Softmax temperature parameter (β)                      |    |     |             | ✅             |     ✅     |
| Mixing probability (λ)                                 |    |     |              |               |    ✅      |

These parameters are managed by a class named [Config](conf_tmaze.py), and they are loaded in each module.
If you plan to experiment with various parameter settings, 
you might find it convenient to run the following Git command to prevent tracking changes to the parameter file,
so the changes won't appear in the Diff every time.

```bash
git update-index --skip-worktree pgmcts_tmaze/conf_tmaze.py
```

To track changes again, run:

```bash
git update-index --no-skip-worktree pgmcts_tmaze/conf_tmaze.py
```
