Running `GraphEnv` with `ray.tune`

Practical reinforcement learning will typically leverage the ray.tune infrastructure to scale up environment rollouts and policy model training. For the hallway example, an example tensorflow implementation consists of the following:

import ray
from graphenv.examples.hallway.hallway_model import HallwayModel
from graphenv.examples.hallway.hallway_state import HallwayState
from graphenv.graph_env import GraphEnv
from ray import tune

config = {
    "env": GraphEnv,
    "env_config": {
        "state": HallwayState(5),
        "max_num_children": 2,
    },
    "model": {
        "custom_model": HallwayModel,
        "custom_model_config": {"hidden_dim": 32},
    },
    "framework": "tf2",
    "eager_tracing": True,
    "num_workers": 1,
}

stop = {
    "training_iteration": 5,
}

if __name__ == "__main__":

    ray.init()

    tune.run(
        "PPO",
        config=config,
        stop=stop,
    )

In lines 7-20, we specify configuration options for PPO, including matching the framework with that used in the provided HallwayModel policy. This script runs 5 iterations of the PPO training algorithm, and the results can be monitored with tensorboard.

Running the same experiment with pytorch requires writing a pytorch-compatible policy model, demonstrated in graphenv.examples.hallway.hallway_model_torch. Beyond this, the only required modifications to the training script to use pytorch instead of tensorflow are shown below:

import ray
from graphenv.examples.hallway.hallway_model_torch import TorchHallwayModel
from graphenv.examples.hallway.hallway_state import HallwayState
from graphenv.graph_env import GraphEnv
from ray import tune

config = {
    "env": GraphEnv,
    "env_config": {
        "state": HallwayState(5),
        "max_num_children": 2,
    },
    "model": {
        "custom_model": TorchHallwayModel,
        "custom_model_config": {"hidden_dim": 32},
    },
    "framework": "torch",
    "num_workers": 1,
}

stop = {
    "training_iteration": 5,
}

if __name__ == "__main__":

    ray.init()

    tune.run(
        "PPO",
        config=config,
        stop=stop,
    )

Running GraphEnv with ray.tune

Running `GraphEnv` with `ray.tune`