wrappers module

Environment wrappers for collecting rollouts. Code adopted from https://github.com/HumanCompatibleAI/imitation.git

class wrappers.BufferingWrapper(venv: stable_baselines3.common.vec_env.base_vec_env.VecEnv, error_on_premature_reset: bool = True)

Bases: stable_baselines3.common.vec_env.base_vec_env.VecEnvWrapper

Saves transitions of underlying VecEnv.

Retrieve saved transitions using pop_transitions().

pop_finished_trajectories() Tuple[Sequence[types_unique.TrajectoryWithRew], Sequence[int]]

Pops recorded complete trajectories trajs and episode lengths ep_lens.

Returns:

A tuple (trajs, ep_lens) where trajs is a sequence of trajectories including the terminal state (but possibly missing initial states, if pop_trajectories was previously called) and ep_lens is a sequence of episode lengths. Note the episode length will be longer than the trajectory length when the trajectory misses initial states.

pop_trajectories() Tuple[Sequence[types_unique.TrajectoryWithRew], Sequence[int]]

Pops recorded trajectories trajs and episode lengths ep_lens.

Returns:

A tuple (trajs, ep_lens). trajs is a sequence of trajectory fragments, consisting of data collected after the last call to pop_trajectories. They may miss initial states (if pop_trajectories previously returned a fragment for that episode), and terminal states (if the episode has yet to complete). ep_lens is the total length of completed episodes.

pop_transitions() types_unique.TransitionsWithRew

Pops recorded transitions, returning them as an instance of Transitions.

Returns:

All transitions recorded since the last call.

Raises:

RuntimeError: empty (no transitions recorded since last pop).

reset(**kwargs)

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns

observation

step_async(actions)

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()

Wait for the step taken with step_async().

Returns

observation, reward, done, information

class wrappers.RolloutInfoWrapper(env: gym.core.Env)

Bases: gym.core.Wrapper

Add the entire episode’s rewards and observations to info at episode end.

Whenever done=True, info[“rollouts”] is a dict with keys “obs” and “rews”, whose corresponding values hold the NumPy arrays containing the raw observations and rewards seen during this episode.

reset(**kwargs)

Resets the environment to an initial state and returns an initial observation.

Note that this function should not reset the environment’s random number generator(s); random variables in the environment’s state should be sampled independently between multiple calls to reset(). In other words, each call of reset() should yield an environment suitable for a new episode, independent of previous episodes.

Returns:

observation (object): the initial observation.

step(action)

Run one timestep of the environment’s dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment’s state.

Accepts an action and returns a tuple (observation, reward, done, info).

Args:

action (object): an action provided by the agent

Returns:

observation (object): agent’s observation of the current environment reward (float) : amount of reward returned after previous action done (bool): whether the episode has ended, in which case further step() calls will return undefined results info (dict): contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)