rex.multi_year_resource.MultiYearResource

class MultiYearResource(h5_path, years=None, unscale=True, str_decode=True, res_cls=<class 'rex.resource.Resource'>, hsds=False, hsds_kwargs=None)[source]

Bases: MultiTimeResource

Class to handle multiple years of resource data stored accross multiple .h5 files. This also works if each year is split into multiple files each containing different datasets (e.g. for Sup3rCC and hi-res WTK+NSRDB). Data split by time in chunks of less than a year can be opened by the MultiTimeResource class, but not by this class.

Note that files across years must have the same meta data, and files within the same year must have the same meta and time_index.

Examples

Extracting the resource’s Datetime Index

>>> path = '$TESTDATADIR/nsrdb/ri_100_nsrdb_*.h5'
>>> with MultiYearResource(path) as res:
>>>     ti = res.time_index
>>>
>>> ti
DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:30:00',
               '2012-01-01 01:00:00', '2012-01-01 01:30:00',
               '2012-01-01 02:00:00', '2012-01-01 02:30:00',
               '2012-01-01 03:00:00', '2012-01-01 03:30:00',
               '2012-01-01 04:00:00', '2012-01-01 04:30:00',
               ...
               '2013-12-31 19:00:00', '2013-12-31 19:30:00',
               '2013-12-31 20:00:00', '2013-12-31 20:30:00',
               '2013-12-31 21:00:00', '2013-12-31 21:30:00',
               '2013-12-31 22:00:00', '2013-12-31 22:30:00',
               '2013-12-31 23:00:00', '2013-12-31 23:30:00'],
              dtype='datetime64[ns]', length=35088, freq=None)

NOTE: time_index covers data from 2012 and 2013

>>> with MultiYearResource(path) as res:
>>>     print(res.h5_files)
[‘/Users/mrossol/Git_Repos/rex/tests/data/nsrdb/ri_100_nsrdb_2012.h5’,

‘/Users/mrossol/Git_Repos/rex/tests/data/nsrdb/ri_100_nsrdb_2013.h5’]

Data slicing works the same as with “Resource” except axis 0 now covers 2012 and 2013

>>> with MultiYearResource(path) as res:
>>>     temperature = res['air_temperature']
>>>
>>> temperature
[[ 4.  5.  5. ...  4.  3.  4.]
 [ 4.  4.  5. ...  4.  3.  4.]
 [ 4.  4.  5. ...  4.  3.  4.]
 ...
 [-1. -1.  0. ... -2. -3. -2.]
 [-1. -1.  0. ... -2. -3. -2.]
 [-1. -1.  0. ... -2. -3. -2.]]
>>> temperature.shape
(35088, 100)
>>> with MultiYearResource(path) as res:
>>>     temperature = res['air_temperature', ::100] # every 100th timestep
>>>
>>> temperature
[[ 4.  5.  5. ...  4.  3.  4.]
 [ 1.  1.  2. ...  0.  0.  1.]
 [-2. -1. -1. ... -2. -4. -2.]
 ...
 [-3. -2. -2. ... -3. -4. -3.]
 [ 0.  0.  1. ...  0. -1.  0.]
 [ 3.  3.  3. ...  2.  2.  3.]]
>>> temperature.shape
(351, 100)

You can also request a specific year of data using a string representation of the year of interest NOTE: you can also request a list of years using strings

>>> with MultiYearResource(path) as res:
>>>     temperature = res['air_temperature', '2012'] # every 100th timestep
>>>
>>> temperature
[[4. 5. 5. ... 4. 3. 4.]
 [4. 4. 5. ... 4. 3. 4.]
 [4. 4. 5. ... 4. 3. 4.]
 ...
 [1. 1. 2. ... 0. 0. 0.]
 [1. 1. 2. ... 0. 0. 1.]
 [1. 1. 2. ... 0. 0. 1.]]
>>> temperature.shape
(17520, 100)
Parameters:
  • h5_path (str | list) – Unix shell style pattern path with * wildcards to multi-file resource file sets. Files must have the same coordinates but can have different datasets or time indexes. Can also be an explicit list of multi time files, which themselves can contain * wildcards.

  • years (list, optional) – List of years to access, by default None

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • res_cls (obj) – Resource handler to us to open individual .h5 files

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None

Methods

close()

Close h5 instance

get_attrs([dset])

Get h5 attributes either from file or dataset

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

get_meta_arr(rec_name[, rows])

Get a meta array by name (faster than DataFrame extraction).

get_scale_factor(dset)

Get dataset scale factor

get_units(dset)

Get dataset units

Attributes

attrs

Dictionary of all dataset attributes

chunks

Dictionary of all dataset chunk sizes

coordinates

(lat, lon) pairs

datasets

Datasets available

dsets

Datasets available

dtypes

Dictionary of all dataset dtypes

global_attrs

Global (file) attributes

h5

Open class instance that handles all .h5 files that data is to be extracted from

lat_lon

Extract (latitude, longitude) pairs

meta

Resource meta data DataFrame

res_dsets

Available resource datasets

resource_datasets

Available resource datasets

scale_factors

Dictionary of all dataset scale factors

shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

shapes

Dictionary of all dataset shapes

time_index

Resource DatetimeIndex

units

Dictionary of all dataset units

years

Available years

property years

Available years

Returns:

list – List of dataset present in .h5 files

property attrs

Dictionary of all dataset attributes

Returns:

attrs (dict)

property chunks

Dictionary of all dataset chunk sizes

Returns:

chunks (dict)

close()

Close h5 instance

property coordinates

(lat, lon) pairs

Returns:

lat_lon (ndarray)

Type:

Coordinates

property datasets

Datasets available

Returns:

list

property dsets

Datasets available

Returns:

list

property dtypes

Dictionary of all dataset dtypes

Returns:

dtypes (dict)

get_attrs(dset=None)

Get h5 attributes either from file or dataset

Parameters:

dset (str) – Dataset to get attributes for, if None get file (global) attributes

Returns:

attrs (dict) – Dataset or file attributes

get_dset_properties(dset)

Get dataset properties (shape, dtype, chunks)

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

  • shape (tuple) – Dataset array shape

  • dtype (str) – Dataset array dtype

  • chunks (tuple) – Dataset chunk size

get_meta_arr(rec_name, rows=slice(None, None, None))

Get a meta array by name (faster than DataFrame extraction).

Parameters:
  • rec_name (str) – Named record from the meta data to retrieve.

  • rows (slice) – Rows of the record to extract.

Returns:

meta_arr (np.ndarray) – Extracted array from the meta data record name.

get_scale_factor(dset)

Get dataset scale factor

Parameters:

dset (str) – Dataset to get scale factor for

Returns:

float – Dataset scale factor, used to unscale int values to floats

get_units(dset)

Get dataset units

Parameters:

dset (str) – Dataset to get units for

Returns:

str – Dataset units, None if not defined

property global_attrs

Global (file) attributes

Returns:

global_attrs (dict)

property h5

Open class instance that handles all .h5 files that data is to be extracted from

Returns:

h5 (MultiTimeH5 | MultiYearH5)

property lat_lon

Extract (latitude, longitude) pairs

Returns:

lat_lon (ndarray)

property meta

Resource meta data DataFrame

Returns:

meta (pandas.DataFrame)

property res_dsets

Available resource datasets

Returns:

list

property resource_datasets

Available resource datasets

Returns:

list

property scale_factors

Dictionary of all dataset scale factors

Returns:

scale_factors (dict)

property shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

Returns:

shape (tuple)

property shapes

Dictionary of all dataset shapes

Returns:

shapes (dict)

property time_index

Resource DatetimeIndex

Returns:

time_index (pandas.DatetimeIndex)

property units

Dictionary of all dataset units

Returns:

units (dict)