graphenv.graph_env.GraphEnv

class GraphEnv(env_config)[source]

Bases: gymnasium.core.Env

Defines an OpenAI Gym Env for traversing a graph using the current vertex as the state, and the successor vertices as actions.

GraphEnv uses composition to supply the per-vertex model of type Vertex, which defines the graph via it’s _get_children() method.

The env_config dictionary should contain the following keys:

state (N): Current vertex
max_num_children (int): maximum number of children considered at a time.
Parameters

env_config (dict) – A dictionary of parameters, required to conform with rllib’s environment initialization.

Return type

None

Methods

close

After the user has finished using the environment, close contains the code necessary to "clean up" the environment.

make_observation

Makes an observation for this state which includes observations of each possible action, and the current state.

render

Delegates to Vertex.render()

reset

Reset this state to the root vertex.

step

Steps the environment to a new state by taking an action.

Attributes

metadata

np_random

Returns the environment's internal _np_random that if not set will initialise with a random seed.

render_mode

reward_range

spec

unwrapped

Returns the base non-wrapped environment (i.e., removes all wrappers).

state

current vertex

max_num_children

maximum number of actions considered at a time

observation_space

the observation space of the graph environment

action_space

the action space, a Discrete space over max_num_children

action_space: gymnasium.spaces.space.Space

the action space, a Discrete space over max_num_children

make_observation()[source]

Makes an observation for this state which includes observations of each possible action, and the current state.

Expects the action observations to all be Dicts with the same keys.

Returns a column-oriented representation, a Dict with keys matching the action observation keys, and values that are the current state and every action’s values for that key concatenated into numpy arrays.

The current state is the 0th entry in these arrays, and the children are offset by one index to accommodate that.

Returns

A list of next state observations.

Return type

List[any]

max_num_children: int

maximum number of actions considered at a time

Type

int

observation_space: gymnasium.spaces.space.Space

the observation space of the graph environment

render(mode='human')[source]

Delegates to Vertex.render()

Parameters

mode (str) –

Return type

Any

reset(**kwargs)[source]

Reset this state to the root vertex. It is possible for state.root to return different root vertices on each call.

Returns

Observation of the root vertex.

Return type

Dict[str, np.ndarray]

state: graphenv.vertex.V

current vertex

Type

graphenv.vertex.Vertex

step(action)[source]

Steps the environment to a new state by taking an action. In the case of GraphEnv, the action specifies which next vertex to move to and this method advances the environment to that vertex.

Parameters

action (int) – The index of the child vertex of self.state to move to.

Raises

RuntimeError – When action is an invalid index.

Returns

Tuple of:

a dictionary of the new state’s observation, the reward received by moving to the new state’s vertex, a bool which is true iff the new state is a terminal vertex, a bool which is true if the search is truncated a dictionary of debugging information related to this call

Return type

Tuple[Dict[str, np.ndarray], float, bool, dict]