rex.outputs.Outputs

class Outputs(h5_file, mode='r', unscale=True, str_decode=True, group=None)[source]

Bases: BaseResource

Base class to handle output data in .h5 format

Examples

The Outputs handler can be used to initialize h5 files in the standard reV/rex resource data format.

>>> from rex import Outputs
>>> import pandas as pd
>>> import numpy as np
>>>
>>> meta = pd.DataFrame({'latitude': np.ones(100),
>>>                      'longitude': np.ones(100)})
>>>
>>> time_index = pd.date_range('20210101', '20220101', freq='1h',
>>>                            closed='right')
>>>
>>> with Outputs('test.h5', 'w') as f:
>>>     f.meta = meta
>>>     f.time_index = time_index

You can also use the Outputs handler to read output h5 files from disk. The Outputs handler will automatically parse the meta data and time index into the expected pandas objects (DataFrame and DatetimeIndex, respectively).

>>> with Outputs('test.h5') as f:
>>>     print(f.meta.head())
>>>
     latitude  longitude
gid
0         1.0        1.0
1         1.0        1.0
2         1.0        1.0
3         1.0        1.0
4         1.0        1.0

>>> with Outputs('test.h5') as f:
>>>     print(f.time_index)
DatetimeIndex(['2021-01-01 01:00:00+00:00', '2021-01-01 02:00:00+00:00',
               '2021-01-01 03:00:00+00:00', '2021-01-01 04:00:00+00:00',
               '2021-01-01 05:00:00+00:00', '2021-01-01 06:00:00+00:00',
               '2021-01-01 07:00:00+00:00', '2021-01-01 08:00:00+00:00',
               '2021-01-01 09:00:00+00:00', '2021-01-01 10:00:00+00:00',
               ...
               '2021-12-31 15:00:00+00:00', '2021-12-31 16:00:00+00:00',
               '2021-12-31 17:00:00+00:00', '2021-12-31 18:00:00+00:00',
               '2021-12-31 19:00:00+00:00', '2021-12-31 20:00:00+00:00',
               '2021-12-31 21:00:00+00:00', '2021-12-31 22:00:00+00:00',
               '2021-12-31 23:00:00+00:00', '2022-01-01 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', length=8760, freq=None)

There are a few ways to use the Outputs handler to write data to a file. Here is one example using the pre-initialized file we created earlier. Note that the Outputs handler will automatically scale float data using the “scale_factor” attribute. The Outputs handler will unscale the data while being read unless the unscale kwarg is explicityly set to False. This behavior is intended to reduce disk storage requirements for big data and can be disabled by setting dtype=np.float32 or dtype=np.float64 when writing data.

>>> Outputs.add_dataset(h5_file='test.h5', dset_name='dset1',
>>>                     dset_data=np.ones((8760, 100)) * 42.42,
>>>                     attrs={'scale_factor': 100}, dtype=np.int32)

>>> with Outputs('test.h5') as f:
>>>     print(f['dset1'])
>>>     print(f['dset1'].dtype)
[[42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 ...
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]]
float32

>>> with Outputs('test.h5', unscale=False) as f:
>>>     print(f['dset1'])
>>>     print(f['dset1'].dtype)
[[4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]
 ...
 [4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]]
int32

Note that the Outputs handler is specifically designed to read and write spatiotemporal data. It is therefore important to intialize the meta data and time index objects even if your data is only spatial or only temporal. Furthermore, the Outputs handler will always assume that 1D datasets represent scalar data (non-timeseries) that corresponds to the meta data shape, and that 2D datasets represent spatiotemporal data whose shape corresponds to (len(time_index), len(meta)). You can see these constraints here:

>>> Outputs.add_dataset(h5_file='test.h5', dset_name='bad_shape',
                        dset_data=np.ones((1, 100)) * 42.42,
                        attrs={'scale_factor': 100}, dtype=np.int32)
HandlerValueError: 2D data with shape (1, 100) is not of the proper
spatiotemporal shape: (8760, 100)

>>> Outputs.add_dataset(h5_file='test.h5', dset_name='bad_shape',
                        dset_data=np.ones((8760,)) * 42.42,
                        attrs={'scale_factor': 100}, dtype=np.int32)
HandlerValueError: 1D data with shape (8760,) is not of the proper
spatial shape: (100,)

Parameters:

h5_file (str) – Path to .h5 resource file
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘r’
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None

Methods

`add_dataset`(h5_file, dset_name, dset_data, dtype)	Add dataset to h5_file
`close`()	Close h5 instance
`df_str_decode`(df)	Decode a dataframe with byte string columns into ordinary str cols.
`get_SAM_df`(site)	Placeholder for get_SAM_df method that it resource specific
`get_attrs`([dset])	Get h5 attributes either from file or dataset
`get_config`(config_name)	Get SAM config
`get_dset_properties`(dset)	Get dataset properties (shape, dtype, chunks)
`get_meta_arr`(rec_name[, rows])	Get a meta array by name (faster than DataFrame extraction).
`get_scale_factor`(dset)	Get dataset scale factor
`get_units`(dset)	Get dataset units
`init_h5`(h5_file, dsets, shapes, attrs, ...)	Init a full output file with the final intended shape without data.
`is_hsds_file`(file_path)	Parse one or more filepath to determine if it is hsds
`is_s3_file`(file_path)	Parse one or more filepath to determine if it is s3
`open_dataset`(ds_name)	Open resource dataset
`open_file`(file_path[, mode, hsds, hsds_kwargs])	Open a filepath to an h5, s3, or hsds nrel resource file with the appropriate python object.
`preload_SAM`(h5_file, sites, tech[, unscale, ...])	Pre-load project_points for SAM
`set_configs`(SAM_configs)	Set SAM configuration JSONs as attributes of 'meta'
`set_version_attr`()	Set the version attribute to the h5 file.
`update_dset`(dset, dset_array[, dset_slice])	Check to see if dset needs to be updated on disk If so write dset_array to disk
`write_dataset`(dset_name, data, dtype[, ...])	Write dataset to disk.
`write_means`(h5_file, meta, dset_name, means, ...)	Write means array to disk
`write_profiles`(h5_file, meta, time_index, ...)	Write profiles to disk

Attributes

`ADD_ATTR`
`SAM_configs`	SAM configuration JSONs used to create CF profiles
`SCALE_ATTR`
`UNIT_ATTR`
`adders`	Dictionary of all dataset add offset factors
`attrs`	Dictionary of all dataset attributes
`chunks`	Dictionary of all dataset chunk sizes
`coordinates`	(lat, lon) pairs
`data_version`	Get the version attribute of the data.
`datasets`	Datasets available
`dsets`	Datasets available
`dtypes`	Dictionary of all dataset dtypes
`full_version_record`	Get record of versions for dependencies
`global_attrs`	Global (file) attributes
`groups`	Groups available
`h5`	Open h5py File instance.
`lat_lon`	Extract (latitude, longitude) pairs
`meta`	Resource meta data DataFrame
`package`	Package used to create file
`res_dsets`	Available resource datasets
`resource_datasets`	Available resource datasets
`run_attrs`	Runtime attributes stored at the global (file) level
`scale_factors`	Dictionary of all dataset scale factors
`shape`	Variable array shape from time_index and meta
`shapes`	Dictionary of all dataset shapes
`source`	Package and version used to create file
`time_index`	Resource DatetimeIndex
`units`	Dictionary of all dataset units
`version`	Version of package used to create file
`writable`	Check to see if h5py.File instance is writable

property full_version_record

Get record of versions for dependencies

Returns:: dict – Dictionary of package versions for dependencies

set_version_attr()[source]: Set the version attribute to the h5 file.

property version

Version of package used to create file

Returns:: str

property package

Package used to create file

Returns:: str

property source

Package and version used to create file

Returns:: str

property shape

Variable array shape from time_index and meta

Returns:: tuple – shape of variables arrays == (time, locations)

property writable

Check to see if h5py.File instance is writable

Returns:: is_writable (bool) – Flag if mode is writable

property meta

Resource meta data DataFrame

Returns:: meta (pandas.DataFrame)

property time_index

Resource DatetimeIndex

Returns:: time_index (pandas.DatetimeIndex)

property SAM_configs

SAM configuration JSONs used to create CF profiles

Returns:: configs (dict) – Dictionary of SAM configuration JSONs

property run_attrs

Runtime attributes stored at the global (file) level

Returns:: global_attrs (dict)

get_config(config_name)[source]

Get SAM config

Parameters:: config_name (str) – Name of config
Returns:: config (dict) – SAM config JSON as a dictionary

set_configs(SAM_configs)[source]

Set SAM configuration JSONs as attributes of ‘meta’

Parameters:: SAM_configs (dict) – Dictionary of SAM configuration JSONs

update_dset(dset, dset_array, dset_slice=None)[source]

Check to see if dset needs to be updated on disk If so write dset_array to disk

Parameters:

dset (str) – dataset to update
dset_array (ndarray) – dataset array
dset_slice (tuple) – slice of dataset to update, it None update all

write_dataset(dset_name, data, dtype, chunks=None, attrs=None)[source]

Write dataset to disk. Dataset it created in .h5 file and data is scaled if needed.

Parameters:

dset_name (str) – Name of dataset to be added to h5 file.
data (ndarray) – Data to be added to h5 file.
dtype (str) – Intended dataset datatype after scaling.
chunks (tuple) – Chunk size for capacity factor means dataset.
attrs (dict) – Attributes to be set. May include ‘scale_factor’.

classmethod write_profiles(h5_file, meta, time_index, dset_name, profiles, dtype, attrs=None, SAM_configs=None, chunks=(None, 100), unscale=True, mode='w-', str_decode=True, group=None)[source]

Write profiles to disk

Parameters:

h5_file (str) – Path to .h5 resource file
meta (pandas.Dataframe) – Locational meta data
time_index (pandas.DatetimeIndex) – Temporal timesteps
dset_name (str) – Name of the target dataset (should identify the profiles).
profiles (ndarray) – output result timeseries profiles
dtype (str) – Intended dataset datatype after scaling.
attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None
SAM_configs (dict, optional) – Dictionary of SAM configuration JSONs used to compute cf means, by default None
chunks (tuple, optional) – Chunk size for capacity factor means dataset, by default (None, 100)
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘w-’
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None

classmethod write_means(h5_file, meta, dset_name, means, dtype, attrs=None, SAM_configs=None, chunks=None, unscale=True, mode='w-', str_decode=True, group=None)[source]

Write means array to disk

Parameters:

h5_file (str) – Path to .h5 resource file
meta (pandas.Dataframe) – Locational meta data
dset_name (str) – Name of the target dataset (should identify the means).
means (ndarray) – output means array.
dtype (str) – Intended dataset datatype after scaling.
attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None
SAM_configs (dict, optional) – Dictionary of SAM configuration JSONs used to compute cf means, by default None
chunks (tuple, optional) – Chunk size for capacity factor means dataset, by default None
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘w-’
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None

classmethod add_dataset(h5_file, dset_name, dset_data, dtype, attrs=None, chunks=None, unscale=True, mode='a', str_decode=True, group=None)[source]

Add dataset to h5_file

Parameters:

h5_file (str) – Path to .h5 resource file
dset_name (str) – Name of dataset to be added to h5 file
dset_data (ndarray) – Data to be added to h5 file
dtype (str) – Intended dataset datatype after scaling.
attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘a’
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True
group (str, optional) – Group within .h5 resource file to open, by default None

property adders

Dictionary of all dataset add offset factors

Returns:: adders (dict)

property attrs

Dictionary of all dataset attributes

Returns:: attrs (dict)

property chunks

Dictionary of all dataset chunk sizes

Returns:: chunks (dict)

close(): Close h5 instance

property coordinates

(lat, lon) pairs

Returns:: lat_lon (ndarray)
Type:: Coordinates

property data_version

Get the version attribute of the data. None if not available.

Returns:: version (str | None)

property datasets

Datasets available

Returns:: list

static df_str_decode(df)

Decode a dataframe with byte string columns into ordinary str cols.

Parameters:: df (pd.DataFrame) – Dataframe with some columns being byte strings.
Returns:: df (pd.DataFrame) – DataFrame with str columns instead of byte str columns.

property dsets

Datasets available

Returns:: list

property dtypes

Dictionary of all dataset dtypes

Returns:: dtypes (dict)

get_SAM_df(site)

Placeholder for get_SAM_df method that it resource specific

Parameters:: site (int) – Site to extract SAM DataFrame for

get_attrs(dset=None)

Get h5 attributes either from file or dataset

Parameters:: dset (str) – Dataset to get attributes for, if None get file (global) attributes
Returns:: attrs (dict) – Dataset or file attributes

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

shape (tuple) – Dataset array shape
dtype (str) – Dataset array dtype
chunks (tuple) – Dataset chunk size

get_meta_arr(rec_name, rows=slice(None, None, None))

Get a meta array by name (faster than DataFrame extraction).

Parameters:

rec_name (str) – Named record from the meta data to retrieve.
rows (slice) – Rows of the record to extract.

Returns:

meta_arr (np.ndarray) – Extracted array from the meta data record name.

get_scale_factor(dset)

Get dataset scale factor

Parameters:: dset (str) – Dataset to get scale factor for
Returns:: float – Dataset scale factor, used to unscale int values to floats

get_units(dset)

Get dataset units

Parameters:: dset (str) – Dataset to get units for
Returns:: str – Dataset units, None if not defined

property global_attrs

Global (file) attributes

Returns:: global_attrs (dict)

property groups

Groups available

Returns:: groups (list) – List of groups

property h5

Open h5py File instance. If _group is not None return open Group

Returns:: h5 (h5py.File | h5py.Group)

classmethod init_h5(h5_file, dsets, shapes, attrs, chunks, dtypes, meta, time_index=None, configs=None, unscale=True, mode='w', str_decode=True, group=None, run_attrs=None)[source]

Init a full output file with the final intended shape without data.

Parameters:

h5_file (str) – Full h5 output filepath.
dsets (list) – List of strings of dataset names to initialize (does not include meta or time_index).
shapes (dict) – Dictionary of dataset shapes (keys correspond to dsets).
attrs (dict) – Dictionary of dataset attributes (keys correspond to dsets).
chunks (dict) – Dictionary of chunk tuples (keys correspond to dsets).
dtypes (dict) – dictionary of numpy datatypes (keys correspond to dsets).
meta (pd.DataFrame) – Full meta data.
time_index (pd.datetimeindex | None) – Full pandas datetime index. None implies that only 1D results (no site profiles) are being written.
configs (dict | None) – Optional input configs to set as attr on meta.
unscale (bool) – Boolean flag to automatically unscale variables on extraction
mode (str) – Mode to instantiate h5py.File instance
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
group (str) – Group within .h5 resource file to open
run_attrs (dict | NoneType) – Runtime attributes (args, kwargs) to add as global (file) attributes

static is_hsds_file(file_path)

Parse one or more filepath to determine if it is hsds

Parameters:: file_path (str | list) – One or more file paths (only the first is parsed if multiple)
Returns:: is_hsds_file (bool) – True if hsds

static is_s3_file(file_path)

Parse one or more filepath to determine if it is s3

Parameters:: file_path (str | list) – One or more file paths (only the first is parsed if multiple)
Returns:: is_s3_file (bool) – True if s3

property lat_lon

Extract (latitude, longitude) pairs

Returns:: lat_lon (ndarray)

open_dataset(ds_name)

Open resource dataset

Parameters:: ds_name (str) – Dataset name to open
Returns:: ds (ResourceDataset) – Resource for open resource dataset

classmethod open_file(file_path, mode='r', hsds=False, hsds_kwargs=None)

Open a filepath to an h5, s3, or hsds nrel resource file with the appropriate python object.

Parameters:

file_path (str) – String filepath to .h5 file to extract resource from. Can also be a path to an HSDS file (starts with /nrel/) or S3 file (starts with s3://)
mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘r’
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False. This is now redundant; file paths starting with /nrel/ will be treated as hsds=True by default
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None

Returns:

file (h5py.File | h5pyd.File) – H5 file handler either opening the local file using h5py, or the file on s3 using h5py and fsspec, or the file on HSDS using h5pyd.

classmethod preload_SAM(h5_file, sites, tech, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, time_index_step=None, means=False)

Pre-load project_points for SAM

Parameters:

h5_file (str) – String filepath to .h5 file to extract resource from. Can also be a path to an HSDS file (starts with /nrel/) or S3 file (starts with s3://)
sites (list) – List of sites to be provided to SAM (sites is synonymous with gids aka spatial indices)
tech (str) – Technology to be run by SAM
unscale (bool) – Boolean flag to automatically unscale variables on extraction
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
group (str) – Group within .h5 resource file to open
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False. This is now redundant; file paths starting with /nrel/ will be treated as hsds=True by default
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
time_index_step (int, optional) – Step size for time_index, used to reduce temporal resolution, by default None
means (bool, optional) – Boolean flag to compute mean resource when res_array is set, by default False

Returns:

SAM_res (SAMResource) – Instance of SAMResource pre-loaded with Solar resource for sites in project_points

property res_dsets

Available resource datasets

Returns:: list

property resource_datasets

Available resource datasets

Returns:: list

property scale_factors

Dictionary of all dataset scale factors

Returns:: scale_factors (dict)

property shapes

Dictionary of all dataset shapes

Returns:: shapes (dict)

property units

Dictionary of all dataset units

Returns:: units (dict)