ochre_gym.ochre_env

OchreEnv

`ochre_gym.ochre_env.OchreEnv`

Bases: gym.Env

The OCHRE Gym Environment.

This is a wrapper for an OCHRE Dwelling simulator, which is a building energy simulation tool. The environment is designed to be used with the Gymnasium interface.

`init(env_name: str, dwelling_args: Dict[str, Any], actions: Dict[str, List[str]], vectorize_actions: bool, lookahead: str, reward_args: Dict[str, Any], disable_uncontrollable_loads: bool, vectorize_observations: bool, use_all_ochre_observations: bool, override_ochre_observations_with_keys: Optional[List[str]], observation_space_config: Optional[OchreObservationSpaceBaseConfig] = None, logger: logging.Logger = None)`

Initialize the OCHRE Gym Environment.

Parameters:

Name	Type	Description	Default
`env_name`	`str`	Name of the environment.	required
`dwelling_args`	`Dict`	Dictionary of OCHRE Dwelling arguments for the OCHRE simulator. See https://ochre-docs-final.readthedocs.io/en/latest/InputsAndArguments.html#dwelling-arguments.	required
`actions`	`Dict`	Dictionary with keys given by equipment types and values given by the equipment control types. Sets the actions for the environment.	required
`lookahead`	`str`	Length of price lookahead provided as part of observation, in "hour:minute" format.	required
`reward_args`	`Dict`	Reward configuration. See ochre_env.Reward for more info.	required
`disable_uncontrollable_loads`	`bool`	Disable load due to uncontrolled appliances.	required
`vectorize_observations`	`bool`	Vectorize the observation space. If False, the observation space is a composite spaces.Dict. If True, it is a spaces.Box.	required
`use_all_ochre_observations`	`bool`	Whether to use all OCHRE observations or a reduced set of defaults. Default: True.	required
`override_ochre_observations_with_keys`	`List[str]`	Only take these observations from OCHRE.	required
`observation_space_config`	`Optional[OchreObservationSpaceBaseConfig]`	Observation space configuration. Optionally override the default observation space configuration/args by directly passing a subclass of OchreObservationSpaceBaseConfig.	`None`
`logger`	`logging.Logger`	Logger object. Default: None.	`None`

`reset(seed = None, options = None)`

Reset the environment.

Rolls back the OCHRE Dwelling to the state after the initialization period using copy.deepcopy. The decorator is used to redirect OCHRE's print statements to the logger.

Parameters:

Name	Type	Description	Default
`seed`	`int`	seed for the random number generator.	`None`
`options`	`Dict`	options for the OCHRE Dwelling.	`None`

Returns:

Name	Type	Description
`obs`	`np.array`	a numpy array of observations.
`control_result`	`Dict`	a dictionary of OCHRE control results.

`step(action: Union[Dict, np.array])`

Take a step in the environment.

The decorator is used to redirect OCHRE's print statements to the logger. Currently, every time step, an action must be provided for every equipment.

Parameters:

Name	Type	Description	Default
`action`	`Union[Dict, np.array]`	a numpy array of actions, with shape (n,) if vectorize_actions is True, otherwise a dictionary of actions.	required

Returns:

Name	Type	Description
`obs`	`np.array`	a numpy array of observations.
`rew`	`float`	the single step reward.
`terminated`	`bool`	a flag indicating if the episode is terminated.
`truncated`	`bool`	a flag indicating if the episode is truncated.
`info`	`Dict`	extra information about the step.

Raises:

Type	Description
`ValueError`	If the dictionary action is malformed.
`ModelException`	Internally, OCHRE may throw a ModelException if the Dwelling tries to do something "un-physical". Our current way of handling this is to stop the episode and return a large negative reward.
`AssertionError`	Same as above, but for an assertion error.

`get_obs(control_results)`

Obtain observation from the Dwelling control results.

Parameters:

Name	Type	Description	Default
`control_results`	`Dict`	the control results from OCHRE.	required

Returns:

Name	Type	Description
`obs`	`np.array`	a numpy array for the flattened observation.

`observation_vector_to_dict(observation_vector: np.array) -> OrderedDict`

Convert the observation vector to a dictionary.

Parameters:

Name	Type	Description	Default
`observation_vector`	`np.array`	a numpy array of observations, with shape (n,)	required

Returns:

Name	Type	Description
`observation_dict`	`OrderedDict`	The observation dict.

`action_vector_to_dict(action_vector: np.array) -> OrderedDict`

Convert the action vector to a dictionary.

Parameters:

Name	Type	Description	Default
`action_vector`	`np.array`	a numpy array of actions, with shape (n,)	required

Returns:

Name	Type	Description
`action_dict`	`OrderedDict`	The action dict.

Reward

The reward function at each time step is calculated as:

\[ r = -(\texttt{energy_used} * \texttt{energy_price} + \texttt{discomfort_penalty}). \]

We current support three demand response programs with different energy prices:

TOU: Time-of-use pricing
RTP: Real-time pricing
PC: Power constraint

The discomfort penalty is calculated as:

deviation = max(max(0.0, indoor_temp - self.thermal_comfort_band_high),
                        max(0.0, self.thermal_comfort_band_low - indoor_temp),
                        0.0)
discomfort = self.thermal_discomfort_unit_cost * deviation ** 2

Reward configuration CSV files

The time_of_use_price.csv and dr_power_limit.csv files only have entries for 1 day. Hence, every day in an episode, which may extend over months, will use the same TOU and PC. The real_time_price.csv file has entries for every 5 minutes for 2 months. We will need to consider a general solution for obtaining the price for any time step in an episode, and whether we want these to be fixed or stochastic.

`ochre_gym.ochre_env.Reward`

Reward function for OCHRE Gym environment.

`init(reward_args: Dict, simulation_steps: pd.core.indexes.datetimes.DatetimeIndex, time_resolution: datetime.timedelta)`

Initialize the reward function from the given configuration.

Parameters:

Name	Type	Description	Default
`reward_args`	`Dict`	reward configuration. See below.	required
`simulation_steps`	`DatetimeIndex`	simulation steps in the control episode.	required
`time_resolution`	`datetime.timedelta`	control interval.	required

reward_args dictionary - key name (value type)

thermal_comfort_band_high (float): upper bound of the thermal comfort band.
thermal_comfort_band_low (float): lower bound of the thermal comfort band.
thermal_discocomfort_unit_cost (float): unit cost of thermal discomfort.
reward_scale (float): reward scale (Default is 1.)
dr_type (string): types of DR programs: 'TOU', 'PC' and 'RTP'.
dr_subfolder (string): The name of the subfolder in `ochre_gym/energy_price` containing the DR files.
flat_energy_price (float): energy price in $/kWh.
tou_price_file (string): name of the file in which TOU daily series is
  stored.
rtp_price_file (string): name of the file in which RTP historical series
  is stored.
pc_power_file (string): name of the file in which DR power limit time
  series is stored.
pc_unit_penalty (float): unit cost of power limit violation.

`call(control_results: Dict[str, Any], step_idx: int) -> float`

Calculate single step control reward based on the control results.

Parameters:

Name	Type	Description	Default
`control_results`	`Dict`	control results from OCHRE.	required
`step_idx`	`int`	Current step index.	required

Returns:

Name	Type	Description
`reward`	`float`	reward for the current control step scaled by reward_scale.

ochre_gym.ochre_env

OchreEnv

ochre_gym.ochre_env.OchreEnv

reset(seed = None, options = None)

step(action: Union[Dict, np.array])

get_obs(control_results)

observation_vector_to_dict(observation_vector: np.array) -> OrderedDict

action_vector_to_dict(action_vector: np.array) -> OrderedDict

Reward

ochre_gym.ochre_env.Reward

__init__(reward_args: Dict, simulation_steps: pd.core.indexes.datetimes.DatetimeIndex, time_resolution: datetime.timedelta)

reward_args dictionary - key name (value type)

__call__(control_results: Dict[str, Any], step_idx: int) -> float

`ochre_gym.ochre_env.OchreEnv`

`reset(seed = None, options = None)`

`step(action: Union[Dict, np.array])`

`get_obs(control_results)`

`observation_vector_to_dict(observation_vector: np.array) -> OrderedDict`

`action_vector_to_dict(action_vector: np.array) -> OrderedDict`

`ochre_gym.ochre_env.Reward`

`init(reward_args: Dict, simulation_steps: pd.core.indexes.datetimes.DatetimeIndex, time_resolution: datetime.timedelta)`

`call(control_results: Dict[str, Any], step_idx: int) -> float`