graphenv.graph_env.GraphEnv

class GraphEnv(env_config)[source]

Bases: gymnasium.core.Env

Defines an OpenAI Gym Env for traversing a graph using the current vertex as the state, and the successor vertices as actions.

GraphEnv uses composition to supply the per-vertex model of type Vertex, which defines the graph via it’s _get_children() method.

The env_config dictionary should contain the following keys:

state (N): Current vertex
max_num_children (int): maximum number of children considered at a time.

Parameters: env_config (dict) – A dictionary of parameters, required to conform with rllib’s environment initialization.
Return type: None

Methods

`close`	After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
`make_observation`	Makes an observation for this state which includes observations of each possible action, and the current state.
`render`	Delegates to Vertex.render()
`reset`	Reset this state to the root vertex.
`step`	Steps the environment to a new state by taking an action.

Attributes

`metadata`
`np_random`	Returns the environment's internal `_np_random` that if not set will initialise with a random seed.
`render_mode`
`reward_range`
`spec`
`unwrapped`	Returns the base non-wrapped environment (i.e., removes all wrappers).
`state`	current vertex
`max_num_children`	maximum number of actions considered at a time
`observation_space`	the observation space of the graph environment
`action_space`	the action space, a Discrete space over max_num_children

action_space: gymnasium.spaces.space.Space: the action space, a Discrete space over max_num_children

make_observation()[source]

Makes an observation for this state which includes observations of each possible action, and the current state.

Expects the action observations to all be Dicts with the same keys.

Returns a column-oriented representation, a Dict with keys matching the action observation keys, and values that are the current state and every action’s values for that key concatenated into numpy arrays.

The current state is the 0th entry in these arrays, and the children are offset by one index to accommodate that.

Returns: A list of next state observations.
Return type: List[any]

max_num_children: int

maximum number of actions considered at a time

Type: int

observation_space: gymnasium.spaces.space.Space: the observation space of the graph environment

render(mode='human')[source]

Delegates to Vertex.render()

Parameters: mode (str) –
Return type: Any

reset(**kwargs)[source]

Reset this state to the root vertex. It is possible for state.root to return different root vertices on each call.

Returns: Observation of the root vertex.
Return type: Dict[str, np.ndarray]

state: graphenv.vertex.V

current vertex

Type: graphenv.vertex.Vertex

step(action)[source]

Steps the environment to a new state by taking an action. In the case of GraphEnv, the action specifies which next vertex to move to and this method advances the environment to that vertex.

Parameters

action (int) – The index of the child vertex of self.state to move to.

Raises

RuntimeError – When action is an invalid index.

Returns

Tuple of:: a dictionary of the new state’s observation, the reward received by moving to the new state’s vertex, a bool which is true iff the new state is a terminal vertex, a bool which is true if the search is truncated a dictionary of debugging information related to this call

Return type

Tuple[Dict[str, np.ndarray], float, bool, dict]