rex.outputs.Outputs

class Outputs(h5_file, mode='r', unscale=True, str_decode=True, group=None)[source]

Bases: BaseResource

Base class to handle output data in .h5 format

Examples

The Outputs handler can be used to initialize h5 files in the standard reV/rex resource data format.

>>> from rex import Outputs
>>> import pandas as pd
>>> import numpy as np
>>>
>>> meta = pd.DataFrame({'latitude': np.ones(100),
>>>                      'longitude': np.ones(100)})
>>>
>>> time_index = pd.date_range('20210101', '20220101', freq='1h',
>>>                            closed='right')
>>>
>>> with Outputs('test.h5', 'w') as f:
>>>     f.meta = meta
>>>     f.time_index = time_index

You can also use the Outputs handler to read output h5 files from disk. The Outputs handler will automatically parse the meta data and time index into the expected pandas objects (DataFrame and DatetimeIndex, respectively).

>>> with Outputs('test.h5') as f:
>>>     print(f.meta.head())
>>>
     latitude  longitude
gid
0         1.0        1.0
1         1.0        1.0
2         1.0        1.0
3         1.0        1.0
4         1.0        1.0
>>> with Outputs('test.h5') as f:
>>>     print(f.time_index)
DatetimeIndex(['2021-01-01 01:00:00+00:00', '2021-01-01 02:00:00+00:00',
               '2021-01-01 03:00:00+00:00', '2021-01-01 04:00:00+00:00',
               '2021-01-01 05:00:00+00:00', '2021-01-01 06:00:00+00:00',
               '2021-01-01 07:00:00+00:00', '2021-01-01 08:00:00+00:00',
               '2021-01-01 09:00:00+00:00', '2021-01-01 10:00:00+00:00',
               ...
               '2021-12-31 15:00:00+00:00', '2021-12-31 16:00:00+00:00',
               '2021-12-31 17:00:00+00:00', '2021-12-31 18:00:00+00:00',
               '2021-12-31 19:00:00+00:00', '2021-12-31 20:00:00+00:00',
               '2021-12-31 21:00:00+00:00', '2021-12-31 22:00:00+00:00',
               '2021-12-31 23:00:00+00:00', '2022-01-01 00:00:00+00:00'],
              dtype='datetime64[ns, UTC]', length=8760, freq=None)

There are a few ways to use the Outputs handler to write data to a file. Here is one example using the pre-initialized file we created earlier. Note that the Outputs handler will automatically scale float data using the “scale_factor” attribute. The Outputs handler will unscale the data while being read unless the unscale kwarg is explicityly set to False. This behavior is intended to reduce disk storage requirements for big data and can be disabled by setting dtype=np.float32 or dtype=np.float64 when writing data.

>>> Outputs.add_dataset(h5_file='test.h5', dset_name='dset1',
>>>                     dset_data=np.ones((8760, 100)) * 42.42,
>>>                     attrs={'scale_factor': 100}, dtype=np.int32)
>>> with Outputs('test.h5') as f:
>>>     print(f['dset1'])
>>>     print(f['dset1'].dtype)
[[42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 ...
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]
 [42.42 42.42 42.42 ... 42.42 42.42 42.42]]
float32
>>> with Outputs('test.h5', unscale=False) as f:
>>>     print(f['dset1'])
>>>     print(f['dset1'].dtype)
[[4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]
 ...
 [4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]
 [4242 4242 4242 ... 4242 4242 4242]]
int32

Note that the Outputs handler is specifically designed to read and write spatiotemporal data. It is therefore important to intialize the meta data and time index objects even if your data is only spatial or only temporal. Furthermore, the Outputs handler will always assume that 1D datasets represent scalar data (non-timeseries) that corresponds to the meta data shape, and that 2D datasets represent spatiotemporal data whose shape corresponds to (len(time_index), len(meta)). You can see these constraints here:

>>> Outputs.add_dataset(h5_file='test.h5', dset_name='bad_shape',
                        dset_data=np.ones((1, 100)) * 42.42,
                        attrs={'scale_factor': 100}, dtype=np.int32)
HandlerValueError: 2D data with shape (1, 100) is not of the proper
spatiotemporal shape: (8760, 100)
>>> Outputs.add_dataset(h5_file='test.h5', dset_name='bad_shape',
                        dset_data=np.ones((8760,)) * 42.42,
                        attrs={'scale_factor': 100}, dtype=np.int32)
HandlerValueError: 1D data with shape (8760,) is not of the proper
spatial shape: (100,)
Parameters:
  • h5_file (str) – Path to .h5 resource file

  • mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘r’

  • unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True

  • str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True

  • group (str, optional) – Group within .h5 resource file to open, by default None

Methods

add_dataset(h5_file, dset_name, dset_data, dtype)

Add dataset to h5_file

close()

Close h5 instance

df_str_decode(df)

Decode a dataframe with byte string columns into ordinary str cols.

get_SAM_df(site)

Placeholder for get_SAM_df method that it resource specific

get_attrs([dset])

Get h5 attributes either from file or dataset

get_config(config_name)

Get SAM config

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

get_meta_arr(rec_name[, rows])

Get a meta array by name (faster than DataFrame extraction).

get_scale_factor(dset)

Get dataset scale factor

get_units(dset)

Get dataset units

init_h5(h5_file, dsets, shapes, attrs, ...)

Init a full output file with the final intended shape without data.

open_dataset(ds_name)

Open resource dataset

preload_SAM(h5_file, sites, tech[, unscale, ...])

Pre-load project_points for SAM

set_configs(SAM_configs)

Set SAM configuration JSONs as attributes of 'meta'

set_version_attr()

Set the version attribute to the h5 file.

update_dset(dset, dset_array[, dset_slice])

Check to see if dset needs to be updated on disk If so write dset_array to disk

write_dataset(dset_name, data, dtype[, ...])

Write dataset to disk.

write_means(h5_file, meta, dset_name, means, ...)

Write means array to disk

write_profiles(h5_file, meta, time_index, ...)

Write profiles to disk

Attributes

ADD_ATTR

SAM_configs

SAM configuration JSONs used to create CF profiles

SCALE_ATTR

UNIT_ATTR

adders

Dictionary of all dataset add offset factors

attrs

Dictionary of all dataset attributes

chunks

Dictionary of all dataset chunk sizes

coordinates

(lat, lon) pairs

data_version

Get the version attribute of the data.

datasets

Datasets available

dsets

Datasets available

dtypes

Dictionary of all dataset dtypes

full_version_record

Get record of versions for dependencies

global_attrs

Global (file) attributes

groups

Groups available

h5

Open h5py File instance.

lat_lon

Extract (latitude, longitude) pairs

meta

Resource meta data DataFrame

package

Package used to create file

res_dsets

Available resource datasets

resource_datasets

Available resource datasets

run_attrs

Runtime attributes stored at the global (file) level

scale_factors

Dictionary of all dataset scale factors

shape

Variable array shape from time_index and meta

shapes

Dictionary of all dataset shapes

source

Package and version used to create file

time_index

Resource DatetimeIndex

units

Dictionary of all dataset units

version

Version of package used to create file

writable

Check to see if h5py.File instance is writable

property full_version_record

Get record of versions for dependencies

Returns:

dict – Dictionary of package versions for dependencies

set_version_attr()[source]

Set the version attribute to the h5 file.

property version

Version of package used to create file

Returns:

str

property package

Package used to create file

Returns:

str

property source

Package and version used to create file

Returns:

str

property shape

Variable array shape from time_index and meta

Returns:

tuple – shape of variables arrays == (time, locations)

property writable

Check to see if h5py.File instance is writable

Returns:

is_writable (bool) – Flag if mode is writable

property meta

Resource meta data DataFrame

Returns:

meta (pandas.DataFrame)

property time_index

Resource DatetimeIndex

Returns:

time_index (pandas.DatetimeIndex)

property SAM_configs

SAM configuration JSONs used to create CF profiles

Returns:

configs (dict) – Dictionary of SAM configuration JSONs

property run_attrs

Runtime attributes stored at the global (file) level

Returns:

global_attrs (dict)

get_config(config_name)[source]

Get SAM config

Parameters:

config_name (str) – Name of config

Returns:

config (dict) – SAM config JSON as a dictionary

set_configs(SAM_configs)[source]

Set SAM configuration JSONs as attributes of ‘meta’

Parameters:

SAM_configs (dict) – Dictionary of SAM configuration JSONs

update_dset(dset, dset_array, dset_slice=None)[source]

Check to see if dset needs to be updated on disk If so write dset_array to disk

Parameters:
  • dset (str) – dataset to update

  • dset_array (ndarray) – dataset array

  • dset_slice (tuple) – slice of dataset to update, it None update all

write_dataset(dset_name, data, dtype, chunks=None, attrs=None)[source]

Write dataset to disk. Dataset it created in .h5 file and data is scaled if needed.

Parameters:
  • dset_name (str) – Name of dataset to be added to h5 file.

  • data (ndarray) – Data to be added to h5 file.

  • dtype (str) – Intended dataset datatype after scaling.

  • chunks (tuple) – Chunk size for capacity factor means dataset.

  • attrs (dict) – Attributes to be set. May include ‘scale_factor’.

classmethod write_profiles(h5_file, meta, time_index, dset_name, profiles, dtype, attrs=None, SAM_configs=None, chunks=(None, 100), unscale=True, mode='w-', str_decode=True, group=None)[source]

Write profiles to disk

Parameters:
  • h5_file (str) – Path to .h5 resource file

  • meta (pandas.Dataframe) – Locational meta data

  • time_index (pandas.DatetimeIndex) – Temporal timesteps

  • dset_name (str) – Name of the target dataset (should identify the profiles).

  • profiles (ndarray) – output result timeseries profiles

  • dtype (str) – Intended dataset datatype after scaling.

  • attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None

  • SAM_configs (dict, optional) – Dictionary of SAM configuration JSONs used to compute cf means, by default None

  • chunks (tuple, optional) – Chunk size for capacity factor means dataset, by default (None, 100)

  • unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True

  • mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘w-’

  • str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True

  • group (str, optional) – Group within .h5 resource file to open, by default None

classmethod write_means(h5_file, meta, dset_name, means, dtype, attrs=None, SAM_configs=None, chunks=None, unscale=True, mode='w-', str_decode=True, group=None)[source]

Write means array to disk

Parameters:
  • h5_file (str) – Path to .h5 resource file

  • meta (pandas.Dataframe) – Locational meta data

  • dset_name (str) – Name of the target dataset (should identify the means).

  • means (ndarray) – output means array.

  • dtype (str) – Intended dataset datatype after scaling.

  • attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None

  • SAM_configs (dict, optional) – Dictionary of SAM configuration JSONs used to compute cf means, by default None

  • chunks (tuple, optional) – Chunk size for capacity factor means dataset, by default None

  • unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True

  • mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘w-’

  • str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True

  • group (str, optional) – Group within .h5 resource file to open, by default None

classmethod add_dataset(h5_file, dset_name, dset_data, dtype, attrs=None, chunks=None, unscale=True, mode='a', str_decode=True, group=None)[source]

Add dataset to h5_file

Parameters:
  • h5_file (str) – Path to .h5 resource file

  • dset_name (str) – Name of dataset to be added to h5 file

  • dset_data (ndarray) – Data to be added to h5 file

  • dtype (str) – Intended dataset datatype after scaling.

  • attrs (dict, optional) – Attributes to be set. May include ‘scale_factor’, by default None

  • unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True

  • mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘a’

  • str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read, by default True

  • group (str, optional) – Group within .h5 resource file to open, by default None

property adders

Dictionary of all dataset add offset factors

Returns:

adders (dict)

property attrs

Dictionary of all dataset attributes

Returns:

attrs (dict)

property chunks

Dictionary of all dataset chunk sizes

Returns:

chunks (dict)

close()

Close h5 instance

property coordinates

(lat, lon) pairs

Returns:

lat_lon (ndarray)

Type:

Coordinates

property data_version

Get the version attribute of the data. None if not available.

Returns:

version (str | None)

property datasets

Datasets available

Returns:

list

static df_str_decode(df)

Decode a dataframe with byte string columns into ordinary str cols.

Parameters:

df (pd.DataFrame) – Dataframe with some columns being byte strings.

Returns:

df (pd.DataFrame) – DataFrame with str columns instead of byte str columns.

property dsets

Datasets available

Returns:

list

property dtypes

Dictionary of all dataset dtypes

Returns:

dtypes (dict)

get_SAM_df(site)

Placeholder for get_SAM_df method that it resource specific

Parameters:

site (int) – Site to extract SAM DataFrame for

get_attrs(dset=None)

Get h5 attributes either from file or dataset

Parameters:

dset (str) – Dataset to get attributes for, if None get file (global) attributes

Returns:

attrs (dict) – Dataset or file attributes

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

  • shape (tuple) – Dataset array shape

  • dtype (str) – Dataset array dtype

  • chunks (tuple) – Dataset chunk size

get_meta_arr(rec_name, rows=slice(None, None, None))

Get a meta array by name (faster than DataFrame extraction).

Parameters:
  • rec_name (str) – Named record from the meta data to retrieve.

  • rows (slice) – Rows of the record to extract.

Returns:

meta_arr (np.ndarray) – Extracted array from the meta data record name.

get_scale_factor(dset)

Get dataset scale factor

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

float – Dataset scale factor, used to unscale int values to floats

get_units(dset)

Get dataset units

Parameters:

dset (str) – Dataset to get units for

Returns:

str – Dataset units, None if not defined

property global_attrs

Global (file) attributes

Returns:

global_attrs (dict)

property groups

Groups available

Returns:

groups (list) – List of groups

property h5

Open h5py File instance. If _group is not None return open Group

Returns:

h5 (h5py.File | h5py.Group)

classmethod init_h5(h5_file, dsets, shapes, attrs, chunks, dtypes, meta, time_index=None, configs=None, unscale=True, mode='w', str_decode=True, group=None, run_attrs=None)[source]

Init a full output file with the final intended shape without data.

Parameters:
  • h5_file (str) – Full h5 output filepath.

  • dsets (list) – List of strings of dataset names to initialize (does not include meta or time_index).

  • shapes (dict) – Dictionary of dataset shapes (keys correspond to dsets).

  • attrs (dict) – Dictionary of dataset attributes (keys correspond to dsets).

  • chunks (dict) – Dictionary of chunk tuples (keys correspond to dsets).

  • dtypes (dict) – dictionary of numpy datatypes (keys correspond to dsets).

  • meta (pd.DataFrame) – Full meta data.

  • time_index (pd.datetimeindex | None) – Full pandas datetime index. None implies that only 1D results (no site profiles) are being written.

  • configs (dict | None) – Optional input configs to set as attr on meta.

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • mode (str) – Mode to instantiate h5py.File instance

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • group (str) – Group within .h5 resource file to open

  • run_attrs (dict | NoneType) – Runtime attributes (args, kwargs) to add as global (file) attributes

property lat_lon

Extract (latitude, longitude) pairs

Returns:

lat_lon (ndarray)

open_dataset(ds_name)

Open resource dataset

Parameters:

ds_name (str) – Dataset name to open

Returns:

ds (ResourceDataset) – Resource for open resource dataset

classmethod preload_SAM(h5_file, sites, tech, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, time_index_step=None, means=False)

Pre-load project_points for SAM

Parameters:
  • h5_file (str) – h5_file to extract resource from

  • sites (list) – List of sites to be provided to SAM (sites is synonymous with gids aka spatial indices)

  • tech (str) – Technology to be run by SAM

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • group (str) – Group within .h5 resource file to open

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None

  • time_index_step (int, optional) – Step size for time_index, used to reduce temporal resolution, by default None

  • means (bool, optional) – Boolean flag to compute mean resource when res_array is set, by default False

Returns:

SAM_res (SAMResource) – Instance of SAMResource pre-loaded with Solar resource for sites in project_points

property res_dsets

Available resource datasets

Returns:

list

property resource_datasets

Available resource datasets

Returns:

list

property scale_factors

Dictionary of all dataset scale factors

Returns:

scale_factors (dict)

property shapes

Dictionary of all dataset shapes

Returns:

shapes (dict)

property units

Dictionary of all dataset units

Returns:

units (dict)