rex.outputs.Outputs
- class Outputs(h5_file, mode='r', unscale=True, str_decode=True, group=None)[source]
Bases:
BaseResource
Base class to handle output data in .h5 format
Examples
The Outputs handler can be used to initialize h5 files in the standard reV/rex resource data format.
>>> from rex import Outputs >>> import pandas as pd >>> import numpy as np >>> >>> meta = pd.DataFrame({'latitude': np.ones(100), >>> 'longitude': np.ones(100)}) >>> >>> time_index = pd.date_range('20210101', '20220101', freq='1h', >>> closed='right') >>> >>> with Outputs('test.h5', 'w') as f: >>> f.meta = meta >>> f.time_index = time_index
You can also use the Outputs handler to read output h5 files from disk. The Outputs handler will automatically parse the meta data and time index into the expected pandas objects (DataFrame and DatetimeIndex, respectively).
>>> with Outputs('test.h5') as f: >>> print(f.meta.head()) >>> latitude longitude gid 0 1.0 1.0 1 1.0 1.0 2 1.0 1.0 3 1.0 1.0 4 1.0 1.0
>>> with Outputs('test.h5') as f: >>> print(f.time_index) DatetimeIndex(['2021-01-01 01:00:00+00:00', '2021-01-01 02:00:00+00:00', '2021-01-01 03:00:00+00:00', '2021-01-01 04:00:00+00:00', '2021-01-01 05:00:00+00:00', '2021-01-01 06:00:00+00:00', '2021-01-01 07:00:00+00:00', '2021-01-01 08:00:00+00:00', '2021-01-01 09:00:00+00:00', '2021-01-01 10:00:00+00:00', ... '2021-12-31 15:00:00+00:00', '2021-12-31 16:00:00+00:00', '2021-12-31 17:00:00+00:00', '2021-12-31 18:00:00+00:00', '2021-12-31 19:00:00+00:00', '2021-12-31 20:00:00+00:00', '2021-12-31 21:00:00+00:00', '2021-12-31 22:00:00+00:00', '2021-12-31 23:00:00+00:00', '2022-01-01 00:00:00+00:00'], dtype='datetime64[ns, UTC]', length=8760, freq=None)
There are a few ways to use the Outputs handler to write data to a file. Here is one example using the pre-initialized file we created earlier. Note that the Outputs handler will automatically scale float data using the “scale_factor” attribute. The Outputs handler will unscale the data while being read unless the unscale kwarg is explicityly set to False. This behavior is intended to reduce disk storage requirements for big data and can be disabled by setting dtype=np.float32 or dtype=np.float64 when writing data.
>>> Outputs.add_dataset(h5_file='test.h5', dset_name='dset1', >>> dset_data=np.ones((8760, 100)) * 42.42, >>> attrs={'scale_factor': 100}, dtype=np.int32)
>>> with Outputs('test.h5') as f: >>> print(f['dset1']) >>> print(f['dset1'].dtype) [[42.42 42.42 42.42 ... 42.42 42.42 42.42] [42.42 42.42 42.42 ... 42.42 42.42 42.42] [42.42 42.42 42.42 ... 42.42 42.42 42.42] ... [42.42 42.42 42.42 ... 42.42 42.42 42.42] [42.42 42.42 42.42 ... 42.42 42.42 42.42] [42.42 42.42 42.42 ... 42.42 42.42 42.42]] float32
>>> with Outputs('test.h5', unscale=False) as f: >>> print(f['dset1']) >>> print(f['dset1'].dtype) [[4242 4242 4242 ... 4242 4242 4242] [4242 4242 4242 ... 4242 4242 4242] [4242 4242 4242 ... 4242 4242 4242] ... [4242 4242 4242 ... 4242 4242 4242] [4242 4242 4242 ... 4242 4242 4242] [4242 4242 4242 ... 4242 4242 4242]] int32
Note that the Outputs handler is specifically designed to read and write spatiotemporal data. It is therefore important to intialize the meta data and time index objects even if your data is only spatial or only temporal. Furthermore, the Outputs handler will always assume that 1D datasets represent scalar data (non-timeseries) that corresponds to the meta data shape, and that 2D datasets represent spatiotemporal data whose shape corresponds to (len(time_index), len(meta)). You can see these constraints here:
>>> Outputs.add_dataset(h5_file='test.h5', dset_name='bad_shape', dset_data=np.ones((1, 100)) * 42.42, attrs={'scale_factor': 100}, dtype=np.int32) HandlerValueError: 2D data with shape (1, 100) is not of the proper spatiotemporal shape: (8760, 100)
>>> Outputs.add_dataset(h5_file='test.h5', dset_name='bad_shape', dset_data=np.ones((8760,)) * 42.42, attrs={'scale_factor': 100}, dtype=np.int32) HandlerValueError: 1D data with shape (8760,) is not of the proper spatial shape: (100,)
- Parameters:
h5_file (str) – Path to .h5 resource file
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘r’
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None
Methods
add_dataset
(h5_file, dset_name, dset_data, dtype)Add dataset to h5_file
close
()Close h5 instance
df_str_decode
(df)Decode a dataframe with byte string columns into ordinary str cols.
get_SAM_df
(site)Placeholder for get_SAM_df method that it resource specific
get_attrs
([dset])Get h5 attributes either from file or dataset
get_config
(config_name)Get SAM config
get_dset_properties
(dset)Get dataset properties (shape, dtype, chunks)
get_meta_arr
(rec_name[, rows])Get a meta array by name (faster than DataFrame extraction).
get_scale_factor
(dset)Get dataset scale factor
get_units
(dset)Get dataset units
init_h5
(h5_file, dsets, shapes, attrs, ...)Init a full output file with the final intended shape without data.
open_dataset
(ds_name)Open resource dataset
preload_SAM
(h5_file, sites, tech[, unscale, ...])Pre-load project_points for SAM
set_configs
(SAM_configs)Set SAM configuration JSONs as attributes of 'meta'
Set the version attribute to the h5 file.
update_dset
(dset, dset_array[, dset_slice])Check to see if dset needs to be updated on disk If so write dset_array to disk
write_dataset
(dset_name, data, dtype[, ...])Write dataset to disk.
write_means
(h5_file, meta, dset_name, means, ...)Write means array to disk
write_profiles
(h5_file, meta, time_index, ...)Write profiles to disk
Attributes
ADD_ATTR
SAM configuration JSONs used to create CF profiles
SCALE_ATTR
UNIT_ATTR
Dictionary of all dataset add offset factors
Dictionary of all dataset attributes
Dictionary of all dataset chunk sizes
(lat, lon) pairs
Get the version attribute of the data.
Datasets available
Datasets available
Dictionary of all dataset dtypes
Get record of versions for dependencies
Global (file) attributes
Groups available
Open h5py File instance.
Extract (latitude, longitude) pairs
Resource meta data DataFrame
Package used to create file
Available resource datasets
Available resource datasets
Runtime attributes stored at the global (file) level
Dictionary of all dataset scale factors
Variable array shape from time_index and meta
Dictionary of all dataset shapes
Package and version used to create file
Resource DatetimeIndex
Dictionary of all dataset units
Version of package used to create file
Check to see if h5py.File instance is writable
- property full_version_record
Get record of versions for dependencies
- Returns:
dict – Dictionary of package versions for dependencies
- property version
Version of package used to create file
- Returns:
str
- property package
Package used to create file
- Returns:
str
- property source
Package and version used to create file
- Returns:
str
- property shape
Variable array shape from time_index and meta
- Returns:
tuple – shape of variables arrays == (time, locations)
- property writable
Check to see if h5py.File instance is writable
- Returns:
is_writable (bool) – Flag if mode is writable
- property meta
Resource meta data DataFrame
- Returns:
meta (pandas.DataFrame)
- property time_index
Resource DatetimeIndex
- Returns:
time_index (pandas.DatetimeIndex)
- property SAM_configs
SAM configuration JSONs used to create CF profiles
- Returns:
configs (dict) – Dictionary of SAM configuration JSONs
- property run_attrs
Runtime attributes stored at the global (file) level
- Returns:
global_attrs (dict)
- get_config(config_name)[source]
Get SAM config
- Parameters:
config_name (str) – Name of config
- Returns:
config (dict) – SAM config JSON as a dictionary
- set_configs(SAM_configs)[source]
Set SAM configuration JSONs as attributes of ‘meta’
- Parameters:
SAM_configs (dict) – Dictionary of SAM configuration JSONs
- update_dset(dset, dset_array, dset_slice=None)[source]
Check to see if dset needs to be updated on disk If so write dset_array to disk
- Parameters:
dset (str) – dataset to update
dset_array (ndarray) – dataset array
dset_slice (tuple) – slice of dataset to update, it None update all
- write_dataset(dset_name, data, dtype, chunks=None, attrs=None)[source]
Write dataset to disk. Dataset it created in .h5 file and data is scaled if needed.
- Parameters:
dset_name (str) – Name of dataset to be added to h5 file.
data (ndarray) – Data to be added to h5 file.
dtype (str) – Intended dataset datatype after scaling.
chunks (tuple) – Chunk size for capacity factor means dataset.
attrs (dict) – Attributes to be set. May include ‘scale_factor’.
- classmethod write_profiles(h5_file, meta, time_index, dset_name, profiles, dtype, attrs=None, SAM_configs=None, chunks=(None, 100), unscale=True, mode='w-', str_decode=True, group=None)[source]
Write profiles to disk
- Parameters:
h5_file (str) – Path to .h5 resource file
meta (pandas.Dataframe) – Locational meta data
time_index (pandas.DatetimeIndex) – Temporal timesteps
dset_name (str) – Name of the target dataset (should identify the profiles).
profiles (ndarray) – output result timeseries profiles
dtype (str) – Intended dataset datatype after scaling.
attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None
SAM_configs (dict, optional) – Dictionary of SAM configuration JSONs used to compute cf means, by default None
chunks (tuple, optional) – Chunk size for capacity factor means dataset, by default (None, 100)
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘w-’
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None
- classmethod write_means(h5_file, meta, dset_name, means, dtype, attrs=None, SAM_configs=None, chunks=None, unscale=True, mode='w-', str_decode=True, group=None)[source]
Write means array to disk
- Parameters:
h5_file (str) – Path to .h5 resource file
meta (pandas.Dataframe) – Locational meta data
dset_name (str) – Name of the target dataset (should identify the means).
means (ndarray) – output means array.
dtype (str) – Intended dataset datatype after scaling.
attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None
SAM_configs (dict, optional) – Dictionary of SAM configuration JSONs used to compute cf means, by default None
chunks (tuple, optional) – Chunk size for capacity factor means dataset, by default None
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘w-’
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None
- classmethod add_dataset(h5_file, dset_name, dset_data, dtype, attrs=None, chunks=None, unscale=True, mode='a', str_decode=True, group=None)[source]
Add dataset to h5_file
- Parameters:
h5_file (str) – Path to .h5 resource file
dset_name (str) – Name of dataset to be added to h5 file
dset_data (ndarray) – Data to be added to h5 file
dtype (str) – Intended dataset datatype after scaling.
attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘a’
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None
- property adders
Dictionary of all dataset add offset factors
- Returns:
adders (dict)
- property attrs
Dictionary of all dataset attributes
- Returns:
attrs (dict)
- property chunks
Dictionary of all dataset chunk sizes
- Returns:
chunks (dict)
- close()
Close h5 instance
- property coordinates
(lat, lon) pairs
- Returns:
lat_lon (ndarray)
- Type:
Coordinates
- property data_version
Get the version attribute of the data. None if not available.
- Returns:
version (str | None)
- property datasets
Datasets available
- Returns:
list
- static df_str_decode(df)
Decode a dataframe with byte string columns into ordinary str cols.
- Parameters:
df (pd.DataFrame) – Dataframe with some columns being byte strings.
- Returns:
df (pd.DataFrame) – DataFrame with str columns instead of byte str columns.
- property dsets
Datasets available
- Returns:
list
- property dtypes
Dictionary of all dataset dtypes
- Returns:
dtypes (dict)
- get_SAM_df(site)
Placeholder for get_SAM_df method that it resource specific
- Parameters:
site (int) – Site to extract SAM DataFrame for
- get_attrs(dset=None)
Get h5 attributes either from file or dataset
- Parameters:
dset (str) – Dataset to get attributes for, if None get file (global) attributes
- Returns:
attrs (dict) – Dataset or file attributes
- get_dset_properties(dset)
Get dataset properties (shape, dtype, chunks)
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
shape (tuple) – Dataset array shape
dtype (str) – Dataset array dtype
chunks (tuple) – Dataset chunk size
- get_meta_arr(rec_name, rows=slice(None, None, None))
Get a meta array by name (faster than DataFrame extraction).
- Parameters:
rec_name (str) – Named record from the meta data to retrieve.
rows (slice) – Rows of the record to extract.
- Returns:
meta_arr (np.ndarray) – Extracted array from the meta data record name.
- get_scale_factor(dset)
Get dataset scale factor
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
float – Dataset scale factor, used to unscale int values to floats
- get_units(dset)
Get dataset units
- Parameters:
dset (str) – Dataset to get units for
- Returns:
str – Dataset units, None if not defined
- property global_attrs
Global (file) attributes
- Returns:
global_attrs (dict)
- property groups
Groups available
- Returns:
groups (list) – List of groups
- property h5
Open h5py File instance. If _group is not None return open Group
- Returns:
h5 (h5py.File | h5py.Group)
- classmethod init_h5(h5_file, dsets, shapes, attrs, chunks, dtypes, meta, time_index=None, configs=None, unscale=True, mode='w', str_decode=True, group=None, run_attrs=None)[source]
Init a full output file with the final intended shape without data.
- Parameters:
h5_file (str) – Full h5 output filepath.
dsets (list) – List of strings of dataset names to initialize (does not include meta or time_index).
shapes (dict) – Dictionary of dataset shapes (keys correspond to dsets).
attrs (dict) – Dictionary of dataset attributes (keys correspond to dsets).
chunks (dict) – Dictionary of chunk tuples (keys correspond to dsets).
dtypes (dict) – dictionary of numpy datatypes (keys correspond to dsets).
meta (pd.DataFrame) – Full meta data.
time_index (pd.datetimeindex | None) – Full pandas datetime index. None implies that only 1D results (no site profiles) are being written.
configs (dict | None) – Optional input configs to set as attr on meta.
unscale (bool) – Boolean flag to automatically unscale variables on extraction
mode (str) – Mode to instantiate h5py.File instance
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
group (str) – Group within .h5 resource file to open
run_attrs (dict | NoneType) – Runtime attributes (args, kwargs) to add as global (file) attributes
- property lat_lon
Extract (latitude, longitude) pairs
- Returns:
lat_lon (ndarray)
- open_dataset(ds_name)
Open resource dataset
- Parameters:
ds_name (str) – Dataset name to open
- Returns:
ds (ResourceDataset) – Resource for open resource dataset
- classmethod preload_SAM(h5_file, sites, tech, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, time_index_step=None, means=False)
Pre-load project_points for SAM
- Parameters:
h5_file (str) – h5_file to extract resource from
sites (list) – List of sites to be provided to SAM (sites is synonymous with gids aka spatial indices)
tech (str) – Technology to be run by SAM
unscale (bool) – Boolean flag to automatically unscale variables on extraction
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
group (str) – Group within .h5 resource file to open
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
time_index_step (int, optional) – Step size for time_index, used to reduce temporal resolution, by default None
means (bool, optional) – Boolean flag to compute mean resource when res_array is set, by default False
- Returns:
SAM_res (SAMResource) – Instance of SAMResource pre-loaded with Solar resource for sites in project_points
- property res_dsets
Available resource datasets
- Returns:
list
- property resource_datasets
Available resource datasets
- Returns:
list
- property scale_factors
Dictionary of all dataset scale factors
- Returns:
scale_factors (dict)
- property shapes
Dictionary of all dataset shapes
- Returns:
shapes (dict)
- property units
Dictionary of all dataset units
- Returns:
units (dict)