reward_wrapper module

Common wrapper for adding custom reward values to an environment. Code adopted from https://github.com/HumanCompatibleAI/imitation.git

class reward_wrapper.RewardVecEnvWrapper(venv: stable_baselines3.common.vec_env.base_vec_env.VecEnv, reward_fn: Callable[[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray], numpy.ndarray], ep_history: int = 100)

Bases: stable_baselines3.common.vec_env.base_vec_env.VecEnvWrapper

Uses a provided reward_fn to replace the reward function returned by step().

Automatically resets the inner VecEnv upon initialization. A tricky part about this class is keeping track of the most recent observation from each environment.

Will also include the previous reward given by the inner VecEnv in the returned info dict under the original_env_rew key.

property envs
make_log_callback() reward_wrapper.WrappedRewardCallback

Creates WrappedRewardCallback connected to this RewardVecEnvWrapper.

reset()

Reset all the environments and return an array of observations, or a tuple of observation arrays.

If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.

Returns

observation

step_async(actions)

Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.

You should not call this if a step_async run is already pending.

step_wait()

Wait for the step taken with step_async().

Returns

observation, reward, done, information

class reward_wrapper.WrappedRewardCallback(episode_rewards: Deque[float], *args, **kwargs)

Bases: stable_baselines3.common.callbacks.BaseCallback

Logs mean wrapped reward as part of RL (or other) training.