reward_wrapper module¶
Common wrapper for adding custom reward values to an environment. Code adopted from https://github.com/HumanCompatibleAI/imitation.git
- class reward_wrapper.RewardVecEnvWrapper(venv: stable_baselines3.common.vec_env.base_vec_env.VecEnv, reward_fn: Callable[[numpy.ndarray, numpy.ndarray, numpy.ndarray, numpy.ndarray], numpy.ndarray], ep_history: int = 100)¶
Bases:
stable_baselines3.common.vec_env.base_vec_env.VecEnvWrapper
Uses a provided reward_fn to replace the reward function returned by step().
Automatically resets the inner VecEnv upon initialization. A tricky part about this class is keeping track of the most recent observation from each environment.
Will also include the previous reward given by the inner VecEnv in the returned info dict under the original_env_rew key.
- property envs¶
- make_log_callback() reward_wrapper.WrappedRewardCallback ¶
Creates WrappedRewardCallback connected to this RewardVecEnvWrapper.
- reset()¶
Reset all the environments and return an array of observations, or a tuple of observation arrays.
If step_async is still doing work, that work will be cancelled and step_wait() should not be called until step_async() is invoked again.
- Returns
observation
- step_async(actions)¶
Tell all the environments to start taking a step with the given actions. Call step_wait() to get the results of the step.
You should not call this if a step_async run is already pending.
- step_wait()¶
Wait for the step taken with step_async().
- Returns
observation, reward, done, information
- class reward_wrapper.WrappedRewardCallback(episode_rewards: Deque[float], *args, **kwargs)¶
Bases:
stable_baselines3.common.callbacks.BaseCallback
Logs mean wrapped reward as part of RL (or other) training.