Skip to content

ochre_gym.ochre_env

OchreEnv

ochre_gym.ochre_env.OchreEnv

Bases: gym.Env

The OCHRE Gym Environment.

This is a wrapper for an OCHRE Dwelling simulator, which is a building energy simulation tool. The environment is designed to be used with the Gymnasium interface.

__init__(env_name: str, dwelling_args: Dict[str, Any], actions: Dict[str, List[str]], vectorize_actions: bool, lookahead: str, reward_args: Dict[str, Any], disable_uncontrollable_loads: bool, vectorize_observations: bool, use_all_ochre_observations: bool, override_ochre_observations_with_keys: Optional[List[str]], observation_space_config: Optional[OchreObservationSpaceBaseConfig] = None, logger: logging.Logger = None)

Initialize the OCHRE Gym Environment.

Parameters:

Name Type Description Default
env_name str

Name of the environment.

required
dwelling_args Dict

Dictionary of OCHRE Dwelling arguments for the OCHRE simulator. See https://ochre-docs-final.readthedocs.io/en/latest/InputsAndArguments.html#dwelling-arguments.

required
actions Dict

Dictionary with keys given by equipment types and values given by the equipment control types. Sets the actions for the environment.

required
lookahead str

Length of price lookahead provided as part of observation, in "hour:minute" format.

required
reward_args Dict

Reward configuration. See ochre_env.Reward for more info.

required
disable_uncontrollable_loads bool

Disable load due to uncontrolled appliances.

required
vectorize_observations bool

Vectorize the observation space. If False, the observation space is a composite spaces.Dict. If True, it is a spaces.Box.

required
use_all_ochre_observations bool

Whether to use all OCHRE observations or a reduced set of defaults. Default: True.

required
override_ochre_observations_with_keys List[str]

Only take these observations from OCHRE.

required
observation_space_config Optional[OchreObservationSpaceBaseConfig]

Observation space configuration. Optionally override the default observation space configuration/args by directly passing a subclass of OchreObservationSpaceBaseConfig.

None
logger logging.Logger

Logger object. Default: None.

None
reset(seed = None, options = None)

Reset the environment.

Rolls back the OCHRE Dwelling to the state after the initialization period using copy.deepcopy. The decorator is used to redirect OCHRE's print statements to the logger.

Parameters:

Name Type Description Default
seed int

seed for the random number generator.

None
options Dict

options for the OCHRE Dwelling.

None

Returns:

Name Type Description
obs np.array

a numpy array of observations.

control_result Dict

a dictionary of OCHRE control results.

step(action: Union[Dict, np.array])

Take a step in the environment.

The decorator is used to redirect OCHRE's print statements to the logger. Currently, every time step, an action must be provided for every equipment.

Parameters:

Name Type Description Default
action Union[Dict, np.array]

a numpy array of actions, with shape (n,) if vectorize_actions is True, otherwise a dictionary of actions.

required

Returns:

Name Type Description
obs np.array

a numpy array of observations.

rew float

the single step reward.

terminated bool

a flag indicating if the episode is terminated.

truncated bool

a flag indicating if the episode is truncated.

info Dict

extra information about the step.

Raises:

Type Description
ValueError

If the dictionary action is malformed.

ModelException

Internally, OCHRE may throw a ModelException if the Dwelling tries to do something "un-physical". Our current way of handling this is to stop the episode and return a large negative reward.

AssertionError

Same as above, but for an assertion error.

get_obs(control_results)

Obtain observation from the Dwelling control results.

Parameters:

Name Type Description Default
control_results Dict

the control results from OCHRE.

required

Returns:

Name Type Description
obs np.array

a numpy array for the flattened observation.

observation_vector_to_dict(observation_vector: np.array) -> OrderedDict

Convert the observation vector to a dictionary.

Parameters:

Name Type Description Default
observation_vector np.array

a numpy array of observations, with shape (n,)

required

Returns:

Name Type Description
observation_dict OrderedDict

The observation dict.

action_vector_to_dict(action_vector: np.array) -> OrderedDict

Convert the action vector to a dictionary.

Parameters:

Name Type Description Default
action_vector np.array

a numpy array of actions, with shape (n,)

required

Returns:

Name Type Description
action_dict OrderedDict

The action dict.


Reward

The reward function at each time step is calculated as:

\[ r = -(\texttt{energy_used} * \texttt{energy_price} + \texttt{discomfort_penalty}). \]

We current support three demand response programs with different energy prices:

  • TOU: Time-of-use pricing
  • RTP: Real-time pricing
  • PC: Power constraint

The discomfort penalty is calculated as:

deviation = max(max(0.0, indoor_temp - self.thermal_comfort_band_high),
                        max(0.0, self.thermal_comfort_band_low - indoor_temp),
                        0.0)
discomfort = self.thermal_discomfort_unit_cost * deviation ** 2

Reward configuration CSV files

The time_of_use_price.csv and dr_power_limit.csv files only have entries for 1 day. Hence, every day in an episode, which may extend over months, will use the same TOU and PC. The real_time_price.csv file has entries for every 5 minutes for 2 months. We will need to consider a general solution for obtaining the price for any time step in an episode, and whether we want these to be fixed or stochastic.


ochre_gym.ochre_env.Reward

Reward function for OCHRE Gym environment.

__init__(reward_args: Dict, simulation_steps: pd.core.indexes.datetimes.DatetimeIndex, time_resolution: datetime.timedelta)

Initialize the reward function from the given configuration.

Parameters:

Name Type Description Default
reward_args Dict

reward configuration. See below.

required
simulation_steps DatetimeIndex

simulation steps in the control episode.

required
time_resolution datetime.timedelta

control interval.

required
reward_args dictionary - key name (value type)
thermal_comfort_band_high (float): upper bound of the thermal comfort band.
thermal_comfort_band_low (float): lower bound of the thermal comfort band.
thermal_discocomfort_unit_cost (float): unit cost of thermal discomfort.
reward_scale (float): reward scale (Default is 1.)
dr_type (string): types of DR programs: 'TOU', 'PC' and 'RTP'.
dr_subfolder (string): The name of the subfolder in `ochre_gym/energy_price` containing the DR files.
flat_energy_price (float): energy price in $/kWh.
tou_price_file (string): name of the file in which TOU daily series is
  stored.
rtp_price_file (string): name of the file in which RTP historical series
  is stored.
pc_power_file (string): name of the file in which DR power limit time
  series is stored.
pc_unit_penalty (float): unit cost of power limit violation.
__call__(control_results: Dict[str, Any], step_idx: int) -> float

Calculate single step control reward based on the control results.

Parameters:

Name Type Description Default
control_results Dict

control results from OCHRE.

required
step_idx int

Current step index.

required

Returns:

Name Type Description
reward float

reward for the current control step scaled by reward_scale.