rex.multi_file_resource.MultiFileResource

class MultiFileResource(h5_source, unscale=True, str_decode=True, check_files=False, use_lapse_rate=True)[source]

Bases: AbstractInterpolatedResource

Class to handle fine spatial resolution resource data stored in multiple .h5 files

See also

resource.Resource

Parent class

Examples

Due to the size of the 2018 NSRDB and 5min WTK, datasets are stored in multiple files. MultiFileResource and it’s sub-classes allow for interaction with all datasets as if they are in a single file. MultiFileResource can take a directory containing all files to source data from, or a filepath with a wildcard (*) indicating the filename format.

>>> file = '$TESTDATADIR/wtk/wtk_2010_*m.h5'
>>> with MultiFileResource(file) as res:
>>>     print(self._h5_files)
['$TESTDATADIR/wtk_2010_200m.h5',
 '$TESTDATADIR/wtk_2010_100m.h5']
>>> file_100m = '$TESTDATADIR/wtk_2010_100m.h5'
>>> with Resource(file_100m) as res:
>>>     print(res.datasets)
['coordinates', 'meta', 'pressure_100m', 'temperature_100m', 'time_index',
 'winddirection_100m', 'windspeed_100m']
>>> file_200m = '$TESTDATADIR/wtk_2010_200m.h5'
>>> with Resource(file_200m) as res:
>>>     print(res.datasets)
['coordinates', 'meta', 'pressure_200m', 'temperature_200m', 'time_index',
 'winddirection_200m', 'windspeed_200m']
>>> with MultiFileResource(file) as res:
>>>     print(res.datasets)
['coordinates', 'meta', 'pressure_100m', 'pressure_200m',
 'temperature_100m', 'temperature_200m', 'time_index',
 'winddirection_100m', 'winddirection_200m', 'windspeed_100m',
 'windspeed_200m']
>>> with MultiFileResource(file) as res:
>>>     wspd = res['windspeed_100m']
>>>
>>> wspd
[[15.13 15.17 15.21 ... 15.3  15.32 15.31]
 [15.09 15.13 15.16 ... 15.26 15.29 15.31]
 [15.09 15.12 15.15 ... 15.24 15.23 15.26]
 ...
 [10.29 11.08 11.51 ... 14.43 14.41 14.19]
 [11.   11.19 11.79 ... 13.27 11.93 11.8 ]
 [12.16 12.44 13.09 ... 11.94 10.88 11.12]]
Parameters:
  • h5_source (str | list) – Unix shell style pattern path with * wildcards to multi-file resource file sets. Files must have the same time index and coordinates but can have different datasets. Can also be an explicit list of complete filepaths.

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • check_files (bool) – Check to ensure files have the same coordinates and time_index

  • use_lapse_rate (bool) – If a dataset is only available at a single hub-height and this flag value is set to True, pressure / temperature values will be calculated using linear lapse rate adjustment from the available hub height to the requested one. If the flag value is set to False, the value of these variables at the single available hub-height will be returned for all requested heights. This option has no effect if data is available at multiple hub-heights.

Methods

close()

Close h5 instance

df_str_decode(df)

Decode a dataframe with byte string columns into ordinary str cols.

get_SAM_df(site)

Placeholder for get_SAM_df method that it resource specific

get_attrs([dset])

Get h5 attributes either from file or dataset

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

get_meta_arr(rec_name[, rows])

Get a meta array by name (faster than DataFrame extraction).

get_scale_factor(dset)

Get dataset scale factor

get_units(dset)

Get dataset units

is_hsds_file(file_path)

Parse one or more filepath to determine if it is hsds

is_s3_file(file_path)

Parse one or more filepath to determine if it is s3

open_dataset(ds_name)

Open resource dataset

open_file(file_path[, mode, hsds, hsds_kwargs])

Open a filepath to an h5, s3, or hsds nrel resource file with the appropriate python object.

preload_SAM(h5_file, sites, tech[, unscale, ...])

Pre-load project_points for SAM

Attributes

ADD_ATTR

INTERPOLABLE_DSETS

LAPSE_RATES

Air Temperature and Pressure lapse rate in C/km and Pa/km

SCALE_ATTR

UNIT_ATTR

VARIABLE_NAME

VARIABLE_UNIT

adders

Dictionary of all dataset add offset factors

attrs

Dictionary of all dataset attributes

chunks

Dictionary of all dataset chunk sizes

coordinates

(lat, lon) pairs

data_version

Get the version attribute of the data.

datasets

Datasets available

dsets

Datasets available

dtypes

Dictionary of all dataset dtypes

global_attrs

Global (file) attributes

groups

Groups available

h5

Open h5py File instance.

lat_lon

Extract (latitude, longitude) pairs

meta

Resource meta data DataFrame

res_dsets

Available resource datasets

resource_datasets

Available resource datasets

scale_factors

Dictionary of all dataset scale factors

shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

shapes

Dictionary of all dataset shapes

time_index

Resource DatetimeIndex

units

Dictionary of all dataset units

LAPSE_RATES = {'pressure': 11109, 'temperature': 6.56}

Air Temperature and Pressure lapse rate in C/km and Pa/km

property adders

Dictionary of all dataset add offset factors

Returns:

adders (dict)

property attrs

Dictionary of all dataset attributes

Returns:

attrs (dict)

property chunks

Dictionary of all dataset chunk sizes

Returns:

chunks (dict)

close()

Close h5 instance

property coordinates

(lat, lon) pairs

Returns:

lat_lon (ndarray)

Type:

Coordinates

property data_version

Get the version attribute of the data. None if not available.

Returns:

version (str | None)

property datasets

Datasets available

Returns:

list

static df_str_decode(df)

Decode a dataframe with byte string columns into ordinary str cols.

Parameters:

df (pd.DataFrame) – Dataframe with some columns being byte strings.

Returns:

df (pd.DataFrame) – DataFrame with str columns instead of byte str columns.

property dsets

Datasets available

Returns:

list

property dtypes

Dictionary of all dataset dtypes

Returns:

dtypes (dict)

get_SAM_df(site)

Placeholder for get_SAM_df method that it resource specific

Parameters:

site (int) – Site to extract SAM DataFrame for

get_attrs(dset=None)

Get h5 attributes either from file or dataset

Parameters:

dset (str) – Dataset to get attributes for, if None get file (global) attributes

Returns:

attrs (dict) – Dataset or file attributes

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

  • shape (tuple) – Dataset array shape

  • dtype (str) – Dataset array dtype

  • chunks (tuple) – Dataset chunk size

get_meta_arr(rec_name, rows=slice(None, None, None))

Get a meta array by name (faster than DataFrame extraction).

Parameters:
  • rec_name (str) – Named record from the meta data to retrieve.

  • rows (slice) – Rows of the record to extract.

Returns:

meta_arr (np.ndarray) – Extracted array from the meta data record name.

get_scale_factor(dset)

Get dataset scale factor

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

float – Dataset scale factor, used to unscale int values to floats

get_units(dset)

Get dataset units

Parameters:

dset (str) – Dataset to get units for

Returns:

str – Dataset units, None if not defined

property global_attrs

Global (file) attributes

Returns:

global_attrs (dict)

property groups

Groups available

Returns:

groups (list) – List of groups

property h5

Open h5py File instance. If _group is not None return open Group

Returns:

h5 (h5py.File | h5py.Group)

static is_hsds_file(file_path)

Parse one or more filepath to determine if it is hsds

Parameters:

file_path (str | list) – One or more file paths (only the first is parsed if multiple)

Returns:

is_hsds_file (bool) – True if hsds

static is_s3_file(file_path)

Parse one or more filepath to determine if it is s3

Parameters:

file_path (str | list) – One or more file paths (only the first is parsed if multiple)

Returns:

is_s3_file (bool) – True if s3

property lat_lon

Extract (latitude, longitude) pairs

Returns:

lat_lon (ndarray)

property meta

Resource meta data DataFrame

Returns:

meta (pandas.DataFrame)

open_dataset(ds_name)

Open resource dataset

Parameters:

ds_name (str) – Dataset name to open

Returns:

ds (ResourceDataset) – Resource for open resource dataset

classmethod open_file(file_path, mode='r', hsds=False, hsds_kwargs=None)

Open a filepath to an h5, s3, or hsds nrel resource file with the appropriate python object.

Parameters:
  • file_path (str) – String filepath to .h5 file to extract resource from. Can also be a path to an HSDS file (starts with /nrel/) or S3 file (starts with s3://)

  • mode (str, optional) – Mode to instantiate h5py.File instance, by default ‘r’

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False. This is now redundant; file paths starting with /nrel/ will be treated as hsds=True by default

  • hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None

Returns:

file (h5py.File | h5pyd.File) – H5 file handler either opening the local file using h5py, or the file on s3 using h5py and fsspec, or the file on HSDS using h5pyd.

classmethod preload_SAM(h5_file, sites, tech, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, time_index_step=None, means=False)

Pre-load project_points for SAM

Parameters:
  • h5_file (str) – String filepath to .h5 file to extract resource from. Can also be a path to an HSDS file (starts with /nrel/) or S3 file (starts with s3://)

  • sites (list) – List of sites to be provided to SAM (sites is synonymous with gids aka spatial indices)

  • tech (str) – Technology to be run by SAM

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • group (str) – Group within .h5 resource file to open

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False. This is now redundant; file paths starting with /nrel/ will be treated as hsds=True by default

  • hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None

  • time_index_step (int, optional) – Step size for time_index, used to reduce temporal resolution, by default None

  • means (bool, optional) – Boolean flag to compute mean resource when res_array is set, by default False

Returns:

SAM_res (SAMResource) – Instance of SAMResource pre-loaded with Solar resource for sites in project_points

property res_dsets

Available resource datasets

Returns:

list

property resource_datasets

Available resource datasets

Returns:

list

property scale_factors

Dictionary of all dataset scale factors

Returns:

scale_factors (dict)

property shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

Returns:

shape (tuple)

property shapes

Dictionary of all dataset shapes

Returns:

shapes (dict)

property time_index

Resource DatetimeIndex

Returns:

time_index (pandas.DatetimeIndex)

property units

Dictionary of all dataset units

Returns:

units (dict)