rex.multi_year_resource.MultiYearResource
- class MultiYearResource(h5_path, years=None, unscale=True, str_decode=True, res_cls=<class 'rex.resource.Resource'>, hsds=False, hsds_kwargs=None)[source]
Bases:
MultiTimeResource
Class to handle multiple years of resource data stored accross multiple .h5 files. This also works if each year is split into multiple files each containing different datasets (e.g. for Sup3rCC and hi-res WTK+NSRDB). Data split by time in chunks of less than a year can be opened by the MultiTimeResource class, but not by this class.
Note that files across years must have the same meta data, and files within the same year must have the same meta and time_index.
Examples
Extracting the resource’s Datetime Index
>>> path = '$TESTDATADIR/nsrdb/ri_100_nsrdb_*.h5' >>> with MultiYearResource(path) as res: >>> ti = res.time_index >>> >>> ti DatetimeIndex(['2012-01-01 00:00:00', '2012-01-01 00:30:00', '2012-01-01 01:00:00', '2012-01-01 01:30:00', '2012-01-01 02:00:00', '2012-01-01 02:30:00', '2012-01-01 03:00:00', '2012-01-01 03:30:00', '2012-01-01 04:00:00', '2012-01-01 04:30:00', ... '2013-12-31 19:00:00', '2013-12-31 19:30:00', '2013-12-31 20:00:00', '2013-12-31 20:30:00', '2013-12-31 21:00:00', '2013-12-31 21:30:00', '2013-12-31 22:00:00', '2013-12-31 22:30:00', '2013-12-31 23:00:00', '2013-12-31 23:30:00'], dtype='datetime64[ns]', length=35088, freq=None)
NOTE: time_index covers data from 2012 and 2013
>>> with MultiYearResource(path) as res: >>> print(res.h5_files)
- [‘/Users/mrossol/Git_Repos/rex/tests/data/nsrdb/ri_100_nsrdb_2012.h5’,
‘/Users/mrossol/Git_Repos/rex/tests/data/nsrdb/ri_100_nsrdb_2013.h5’]
Data slicing works the same as with “Resource” except axis 0 now covers 2012 and 2013
>>> with MultiYearResource(path) as res: >>> temperature = res['air_temperature'] >>> >>> temperature [[ 4. 5. 5. ... 4. 3. 4.] [ 4. 4. 5. ... 4. 3. 4.] [ 4. 4. 5. ... 4. 3. 4.] ... [-1. -1. 0. ... -2. -3. -2.] [-1. -1. 0. ... -2. -3. -2.] [-1. -1. 0. ... -2. -3. -2.]] >>> temperature.shape (35088, 100)
>>> with MultiYearResource(path) as res: >>> temperature = res['air_temperature', ::100] # every 100th timestep >>> >>> temperature [[ 4. 5. 5. ... 4. 3. 4.] [ 1. 1. 2. ... 0. 0. 1.] [-2. -1. -1. ... -2. -4. -2.] ... [-3. -2. -2. ... -3. -4. -3.] [ 0. 0. 1. ... 0. -1. 0.] [ 3. 3. 3. ... 2. 2. 3.]] >>> temperature.shape (351, 100)
You can also request a specific year of data using a string representation of the year of interest NOTE: you can also request a list of years using strings
>>> with MultiYearResource(path) as res: >>> temperature = res['air_temperature', '2012'] # every 100th timestep >>> >>> temperature [[4. 5. 5. ... 4. 3. 4.] [4. 4. 5. ... 4. 3. 4.] [4. 4. 5. ... 4. 3. 4.] ... [1. 1. 2. ... 0. 0. 0.] [1. 1. 2. ... 0. 0. 1.] [1. 1. 2. ... 0. 0. 1.]] >>> temperature.shape (17520, 100)
- Parameters:
h5_path (str | list) – Unix shell style pattern path with * wildcards to multi-file resource file sets. Files must have the same coordinates but can have different datasets or time indexes. Can also be an explicit list of multi time files, which themselves can contain * wildcards.
years (list, optional) – List of years to access, by default None
unscale (bool) – Boolean flag to automatically unscale variables on extraction
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
res_cls (obj) – Resource handler to us to open individual .h5 files
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
Methods
close
()Close h5 instance
get_attrs
([dset])Get h5 attributes either from file or dataset
get_dset_properties
(dset)Get dataset properties (shape, dtype, chunks)
get_meta_arr
(rec_name[, rows])Get a meta array by name (faster than DataFrame extraction).
get_scale_factor
(dset)Get dataset scale factor
get_units
(dset)Get dataset units
Attributes
Dictionary of all dataset attributes
Dictionary of all dataset chunk sizes
(lat, lon) pairs
Datasets available
Datasets available
Dictionary of all dataset dtypes
Global (file) attributes
Open class instance that handles all .h5 files that data is to be extracted from
Extract (latitude, longitude) pairs
Resource meta data DataFrame
Available resource datasets
Available resource datasets
Dictionary of all dataset scale factors
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
Dictionary of all dataset shapes
Resource DatetimeIndex
Dictionary of all dataset units
Available years
- property years
Available years
- Returns:
list – List of dataset present in .h5 files
- property attrs
Dictionary of all dataset attributes
- Returns:
attrs (dict)
- property chunks
Dictionary of all dataset chunk sizes
- Returns:
chunks (dict)
- close()
Close h5 instance
- property coordinates
(lat, lon) pairs
- Returns:
lat_lon (ndarray)
- Type:
Coordinates
- property datasets
Datasets available
- Returns:
list
- property dsets
Datasets available
- Returns:
list
- property dtypes
Dictionary of all dataset dtypes
- Returns:
dtypes (dict)
- get_attrs(dset=None)
Get h5 attributes either from file or dataset
- Parameters:
dset (str) – Dataset to get attributes for, if None get file (global) attributes
- Returns:
attrs (dict) – Dataset or file attributes
- get_dset_properties(dset)
Get dataset properties (shape, dtype, chunks)
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
shape (tuple) – Dataset array shape
dtype (str) – Dataset array dtype
chunks (tuple) – Dataset chunk size
- get_meta_arr(rec_name, rows=slice(None, None, None))
Get a meta array by name (faster than DataFrame extraction).
- Parameters:
rec_name (str) – Named record from the meta data to retrieve.
rows (slice) – Rows of the record to extract.
- Returns:
meta_arr (np.ndarray) – Extracted array from the meta data record name.
- get_scale_factor(dset)
Get dataset scale factor
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
float – Dataset scale factor, used to unscale int values to floats
- get_units(dset)
Get dataset units
- Parameters:
dset (str) – Dataset to get units for
- Returns:
str – Dataset units, None if not defined
- property global_attrs
Global (file) attributes
- Returns:
global_attrs (dict)
- property h5
Open class instance that handles all .h5 files that data is to be extracted from
- Returns:
h5 (MultiTimeH5 | MultiYearH5)
- property lat_lon
Extract (latitude, longitude) pairs
- Returns:
lat_lon (ndarray)
- property meta
Resource meta data DataFrame
- Returns:
meta (pandas.DataFrame)
- property res_dsets
Available resource datasets
- Returns:
list
- property resource_datasets
Available resource datasets
- Returns:
list
- property scale_factors
Dictionary of all dataset scale factors
- Returns:
scale_factors (dict)
- property shape
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
- Returns:
shape (tuple)
- property shapes
Dictionary of all dataset shapes
- Returns:
shapes (dict)
- property time_index
Resource DatetimeIndex
- Returns:
time_index (pandas.DatetimeIndex)
- property units
Dictionary of all dataset units
- Returns:
units (dict)