ochre_gym.ochre_env
OchreEnv
ochre_gym.ochre_env.OchreEnv
Bases: gym.Env
The OCHRE Gym Environment.
This is a wrapper for an OCHRE Dwelling simulator, which is a building energy simulation tool. The environment is designed to be used with the Gymnasium interface.
__init__(env_name: str, dwelling_args: Dict[str, Any], actions: Dict[str, List[str]], vectorize_actions: bool, lookahead: str, reward_args: Dict[str, Any], disable_uncontrollable_loads: bool, vectorize_observations: bool, use_all_ochre_observations: bool, override_ochre_observations_with_keys: Optional[List[str]], observation_space_config: Optional[OchreObservationSpaceBaseConfig] = None, logger: logging.Logger = None)
Initialize the OCHRE Gym Environment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
env_name |
str
|
Name of the environment. |
required |
dwelling_args |
Dict
|
Dictionary of OCHRE Dwelling arguments for the OCHRE simulator. See https://ochre-docs-final.readthedocs.io/en/latest/InputsAndArguments.html#dwelling-arguments. |
required |
actions |
Dict
|
Dictionary with keys given by equipment types and values given by the equipment control types. Sets the actions for the environment. |
required |
lookahead |
str
|
Length of price lookahead provided as part of observation, in "hour:minute" format. |
required |
reward_args |
Dict
|
Reward configuration. See ochre_env.Reward for more info. |
required |
disable_uncontrollable_loads |
bool
|
Disable load due to uncontrolled appliances. |
required |
vectorize_observations |
bool
|
Vectorize the observation space. If False, the observation space is a composite spaces.Dict. If True, it is a spaces.Box. |
required |
use_all_ochre_observations |
bool
|
Whether to use all OCHRE observations or a reduced set of defaults. Default: True. |
required |
override_ochre_observations_with_keys |
List[str]
|
Only take these observations from OCHRE. |
required |
observation_space_config |
Optional[OchreObservationSpaceBaseConfig]
|
Observation space configuration. Optionally override the default observation space configuration/args by directly passing a subclass of OchreObservationSpaceBaseConfig. |
None
|
logger |
logging.Logger
|
Logger object. Default: None. |
None
|
reset(seed = None, options = None)
Reset the environment.
Rolls back the OCHRE Dwelling to the state after
the initialization period using copy.deepcopy
.
The decorator is used to redirect OCHRE's print statements
to the logger.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
seed |
int
|
seed for the random number generator. |
None
|
options |
Dict
|
options for the OCHRE Dwelling. |
None
|
Returns:
Name | Type | Description |
---|---|---|
obs |
np.array
|
a numpy array of observations. |
control_result |
Dict
|
a dictionary of OCHRE control results. |
step(action: Union[Dict, np.array])
Take a step in the environment.
The decorator is used to redirect OCHRE's print statements to the logger. Currently, every time step, an action must be provided for every equipment.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
action |
Union[Dict, np.array]
|
a numpy array of actions, with shape (n,) if vectorize_actions is True, otherwise a dictionary of actions. |
required |
Returns:
Name | Type | Description |
---|---|---|
obs |
np.array
|
a numpy array of observations. |
rew |
float
|
the single step reward. |
terminated |
bool
|
a flag indicating if the episode is terminated. |
truncated |
bool
|
a flag indicating if the episode is truncated. |
info |
Dict
|
extra information about the step. |
Raises:
Type | Description |
---|---|
ValueError
|
If the dictionary action is malformed. |
ModelException
|
Internally, OCHRE may throw a ModelException if the Dwelling tries to do something "un-physical". Our current way of handling this is to stop the episode and return a large negative reward. |
AssertionError
|
Same as above, but for an assertion error. |
get_obs(control_results)
Obtain observation from the Dwelling control results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control_results |
Dict
|
the control results from OCHRE. |
required |
Returns:
Name | Type | Description |
---|---|---|
obs |
np.array
|
a numpy array for the flattened observation. |
observation_vector_to_dict(observation_vector: np.array) -> OrderedDict
Convert the observation vector to a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
observation_vector |
np.array
|
a numpy array of observations, with shape (n,) |
required |
Returns:
Name | Type | Description |
---|---|---|
observation_dict |
OrderedDict
|
The observation dict. |
action_vector_to_dict(action_vector: np.array) -> OrderedDict
Convert the action vector to a dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
action_vector |
np.array
|
a numpy array of actions, with shape (n,) |
required |
Returns:
Name | Type | Description |
---|---|---|
action_dict |
OrderedDict
|
The action dict. |
Reward
The reward function at each time step is calculated as:
We current support three demand response programs with different energy prices:
TOU
: Time-of-use pricingRTP
: Real-time pricingPC
: Power constraint
The discomfort penalty is calculated as:
deviation = max(max(0.0, indoor_temp - self.thermal_comfort_band_high),
max(0.0, self.thermal_comfort_band_low - indoor_temp),
0.0)
discomfort = self.thermal_discomfort_unit_cost * deviation ** 2
Reward configuration CSV files
The time_of_use_price.csv
and dr_power_limit.csv
files only have entries for 1 day. Hence, every day in an episode, which may extend over months, will use the same TOU and PC. The real_time_price.csv
file has entries for every 5 minutes for 2 months. We will need to consider a general solution for obtaining the price for any time step in an episode, and whether we want these to be fixed or stochastic.
ochre_gym.ochre_env.Reward
Reward function for OCHRE Gym environment.
__init__(reward_args: Dict, simulation_steps: pd.core.indexes.datetimes.DatetimeIndex, time_resolution: datetime.timedelta)
Initialize the reward function from the given configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
reward_args |
Dict
|
reward configuration. See below. |
required |
simulation_steps |
DatetimeIndex
|
simulation steps in the control episode. |
required |
time_resolution |
datetime.timedelta
|
control interval. |
required |
reward_args dictionary - key name (value type)
thermal_comfort_band_high (float): upper bound of the thermal comfort band.
thermal_comfort_band_low (float): lower bound of the thermal comfort band.
thermal_discocomfort_unit_cost (float): unit cost of thermal discomfort.
reward_scale (float): reward scale (Default is 1.)
dr_type (string): types of DR programs: 'TOU', 'PC' and 'RTP'.
dr_subfolder (string): The name of the subfolder in `ochre_gym/energy_price` containing the DR files.
flat_energy_price (float): energy price in $/kWh.
tou_price_file (string): name of the file in which TOU daily series is
stored.
rtp_price_file (string): name of the file in which RTP historical series
is stored.
pc_power_file (string): name of the file in which DR power limit time
series is stored.
pc_unit_penalty (float): unit cost of power limit violation.
__call__(control_results: Dict[str, Any], step_idx: int) -> float
Calculate single step control reward based on the control results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
control_results |
Dict
|
control results from OCHRE. |
required |
step_idx |
int
|
Current step index. |
required |
Returns:
Name | Type | Description |
---|---|---|
reward |
float
|
reward for the current control step scaled by reward_scale. |