wrappers module¶
Environment wrappers for collecting rollouts. Code adopted from https://github.com/HumanCompatibleAI/imitation.git
- class wrappers.BufferingWrapper(venv: stable_baselines3.common.vec_env.base_vec_env.VecEnv, error_on_premature_reset: bool = True)¶
Bases:
stable_baselines3.common.vec_env.base_vec_env.VecEnvWrapper
Saves transitions of underlying VecEnv.
Retrieve saved transitions using pop_transitions().
- pop_finished_trajectories() Tuple[Sequence[types_unique.TrajectoryWithRew], Sequence[int]] ¶
Pops recorded complete trajectories trajs and episode lengths ep_lens.
- Returns:
A tuple (trajs, ep_lens) where trajs is a sequence of trajectories including the terminal state (but possibly missing initial states, if pop_trajectories was previously called) and ep_lens is a sequence of episode lengths. Note the episode length will be longer than the trajectory length when the trajectory misses initial states.
- pop_trajectories() Tuple[Sequence[types_unique.TrajectoryWithRew], Sequence[int]] ¶
Pops recorded trajectories trajs and episode lengths ep_lens.
- Returns:
A tuple (trajs, ep_lens). trajs is a sequence of trajectory fragments, consisting of data collected after the last call to pop_trajectories. They may miss initial states (if pop_trajectories previously returned a fragment for that episode), and terminal states (if the episode has yet to complete). ep_lens is the total length of completed episodes.
- pop_transitions() types_unique.TransitionsWithRew ¶
Pops recorded transitions, returning them as an instance of Transitions.
- Returns:
All transitions recorded since the last call.
- Raises:
RuntimeError: empty (no transitions recorded since last pop).
- reset(**kwargs)¶
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns
observation
- step_async(actions)¶
Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.
You should not call this if a step_async run is already pending.
- step_wait()¶
Wait for the step taken with step_async().
- Returns
observation, reward, done, information
- class wrappers.RolloutInfoWrapper(env: gym.core.Env)¶
Bases:
gym.core.Wrapper
Add the entire episode’s rewards and observations to info at episode end.
Whenever done=True, info[“rollouts”] is a dict with keys “obs” and “rews”, whose corresponding values hold the NumPy arrays containing the raw observations and rewards seen during this episode.
- reset(**kwargs)¶
Resets the environment to an initial state and returns an initial observation.
Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.
- Returns:
observation (object): the initial observation.
- step(action)¶
Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.
Accepts an action and returns a tuple (observation, reward, done, info).
- Args:
action (object): an action provided by the agent
- Returns:
observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)