graphenv.graph_env.GraphEnv
- class GraphEnv(env_config)[source]
Bases:
gymnasium.core.Env
Defines an OpenAI Gym Env for traversing a graph using the current vertex as the state, and the successor vertices as actions.
GraphEnv uses composition to supply the per-vertex model of type Vertex, which defines the graph via it’s _get_children() method.
The env_config dictionary should contain the following keys:
state (N): Current vertex max_num_children (int): maximum number of children considered at a time.
- Parameters
env_config (dict) – A dictionary of parameters, required to conform with rllib’s environment initialization.
- Return type
None
Methods
close
After the user has finished using the environment, close contains the code necessary to "clean up" the environment.
Makes an observation for this state which includes observations of each possible action, and the current state.
Delegates to Vertex.render()
Reset this state to the root vertex.
Steps the environment to a new state by taking an action.
Attributes
metadata
np_random
Returns the environment's internal
_np_random
that if not set will initialise with a random seed.render_mode
reward_range
spec
unwrapped
Returns the base non-wrapped environment (i.e., removes all wrappers).
current vertex
maximum number of actions considered at a time
the observation space of the graph environment
the action space, a Discrete space over max_num_children
- action_space: gymnasium.spaces.space.Space
the action space, a Discrete space over max_num_children
- make_observation()[source]
Makes an observation for this state which includes observations of each possible action, and the current state.
Expects the action observations to all be Dicts with the same keys.
Returns a column-oriented representation, a Dict with keys matching the action observation keys, and values that are the current state and every action’s values for that key concatenated into numpy arrays.
The current state is the 0th entry in these arrays, and the children are offset by one index to accommodate that.
- Returns
A list of next state observations.
- Return type
List[any]
- max_num_children: int
maximum number of actions considered at a time
- Type
int
- observation_space: gymnasium.spaces.space.Space
the observation space of the graph environment
- reset(**kwargs)[source]
Reset this state to the root vertex. It is possible for state.root to return different root vertices on each call.
- Returns
Observation of the root vertex.
- Return type
Dict[str, np.ndarray]
- state: graphenv.vertex.V
current vertex
- step(action)[source]
Steps the environment to a new state by taking an action. In the case of GraphEnv, the action specifies which next vertex to move to and this method advances the environment to that vertex.
- Parameters
action (int) – The index of the child vertex of self.state to move to.
- Raises
RuntimeError – When action is an invalid index.
- Returns
- Tuple of:
a dictionary of the new state’s observation, the reward received by moving to the new state’s vertex, a bool which is true iff the new state is a terminal vertex, a bool which is true if the search is truncated a dictionary of debugging information related to this call
- Return type
Tuple[Dict[str, np.ndarray], float, bool, dict]