rex.multi_file_resource.MultiFileResource
- class MultiFileResource(h5_source, unscale=True, str_decode=True, check_files=False, use_lapse_rate=True)[source]
Bases:
AbstractInterpolatedResource
Class to handle fine spatial resolution resource data stored in multiple .h5 files
See also
resource.Resource
Parent class
Examples
Due to the size of the 2018 NSRDB and 5min WTK, datasets are stored in multiple files. MultiFileResource and it’s sub-classes allow for interaction with all datasets as if they are in a single file. MultiFileResource can take a directory containing all files to source data from, or a filepath with a wildcard (*) indicating the filename format.
>>> file = '$TESTDATADIR/wtk/wtk_2010_*m.h5' >>> with MultiFileResource(file) as res: >>> print(self._h5_files) ['$TESTDATADIR/wtk_2010_200m.h5', '$TESTDATADIR/wtk_2010_100m.h5']
>>> file_100m = '$TESTDATADIR/wtk_2010_100m.h5' >>> with Resource(file_100m) as res: >>> print(res.datasets) ['coordinates', 'meta', 'pressure_100m', 'temperature_100m', 'time_index', 'winddirection_100m', 'windspeed_100m']
>>> file_200m = '$TESTDATADIR/wtk_2010_200m.h5' >>> with Resource(file_200m) as res: >>> print(res.datasets) ['coordinates', 'meta', 'pressure_200m', 'temperature_200m', 'time_index', 'winddirection_200m', 'windspeed_200m']
>>> with MultiFileResource(file) as res: >>> print(res.datasets) ['coordinates', 'meta', 'pressure_100m', 'pressure_200m', 'temperature_100m', 'temperature_200m', 'time_index', 'winddirection_100m', 'winddirection_200m', 'windspeed_100m', 'windspeed_200m']
>>> with MultiFileResource(file) as res: >>> wspd = res['windspeed_100m'] >>> >>> wspd [[15.13 15.17 15.21 ... 15.3 15.32 15.31] [15.09 15.13 15.16 ... 15.26 15.29 15.31] [15.09 15.12 15.15 ... 15.24 15.23 15.26] ... [10.29 11.08 11.51 ... 14.43 14.41 14.19] [11. 11.19 11.79 ... 13.27 11.93 11.8 ] [12.16 12.44 13.09 ... 11.94 10.88 11.12]]
- Parameters:
h5_source (str | list) – Unix shell style pattern path with * wildcards to multi-file resource file sets. Files must have the same time index and coordinates but can have different datasets. Can also be an explicit list of complete filepaths.
unscale (bool) – Boolean flag to automatically unscale variables on extraction
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
check_files (bool) – Check to ensure files have the same coordinates and time_index
use_lapse_rate (bool) – If a dataset is only available at a single hub-height and this flag value is set to True, pressure / temperature values will be calculated using linear lapse rate adjustment from the available hub height to the requested one. If the flag value is set to False, the value of these variables at the single available hub-height will be returned for all requested heights. This option has no effect if data is available at multiple hub-heights.
Methods
close
()Close h5 instance
df_str_decode
(df)Decode a dataframe with byte string columns into ordinary str cols.
get_SAM_df
(site)Placeholder for get_SAM_df method that it resource specific
get_attrs
([dset])Get h5 attributes either from file or dataset
get_dset_properties
(dset)Get dataset properties (shape, dtype, chunks)
get_meta_arr
(rec_name[, rows])Get a meta array by name (faster than DataFrame extraction).
get_scale_factor
(dset)Get dataset scale factor
get_units
(dset)Get dataset units
open_dataset
(ds_name)Open resource dataset
preload_SAM
(h5_file, sites, tech[, unscale, ...])Pre-load project_points for SAM
Attributes
ADD_ATTR
INTERPOLABLE_DSETS
Air Temperature and Pressure lapse rate in C/km and Pa/km
SCALE_ATTR
UNIT_ATTR
VARIABLE_NAME
VARIABLE_UNIT
Dictionary of all dataset add offset factors
Dictionary of all dataset attributes
Dictionary of all dataset chunk sizes
(lat, lon) pairs
Get the version attribute of the data.
Datasets available
Datasets available
Dictionary of all dataset dtypes
Global (file) attributes
Groups available
Open h5py File instance.
Extract (latitude, longitude) pairs
Resource meta data DataFrame
Available resource datasets
Available resource datasets
Dictionary of all dataset scale factors
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
Dictionary of all dataset shapes
Resource DatetimeIndex
Dictionary of all dataset units
- LAPSE_RATES = {'pressure': 11109, 'temperature': 6.56}
Air Temperature and Pressure lapse rate in C/km and Pa/km
- property adders
Dictionary of all dataset add offset factors
- Returns:
adders (dict)
- property attrs
Dictionary of all dataset attributes
- Returns:
attrs (dict)
- property chunks
Dictionary of all dataset chunk sizes
- Returns:
chunks (dict)
- close()
Close h5 instance
- property coordinates
(lat, lon) pairs
- Returns:
lat_lon (ndarray)
- Type:
Coordinates
- property data_version
Get the version attribute of the data. None if not available.
- Returns:
version (str | None)
- property datasets
Datasets available
- Returns:
list
- static df_str_decode(df)
Decode a dataframe with byte string columns into ordinary str cols.
- Parameters:
df (pd.DataFrame) – Dataframe with some columns being byte strings.
- Returns:
df (pd.DataFrame) – DataFrame with str columns instead of byte str columns.
- property dsets
Datasets available
- Returns:
list
- property dtypes
Dictionary of all dataset dtypes
- Returns:
dtypes (dict)
- get_SAM_df(site)
Placeholder for get_SAM_df method that it resource specific
- Parameters:
site (int) – Site to extract SAM DataFrame for
- get_attrs(dset=None)
Get h5 attributes either from file or dataset
- Parameters:
dset (str) – Dataset to get attributes for, if None get file (global) attributes
- Returns:
attrs (dict) – Dataset or file attributes
- get_dset_properties(dset)
Get dataset properties (shape, dtype, chunks)
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
shape (tuple) – Dataset array shape
dtype (str) – Dataset array dtype
chunks (tuple) – Dataset chunk size
- get_meta_arr(rec_name, rows=slice(None, None, None))
Get a meta array by name (faster than DataFrame extraction).
- Parameters:
rec_name (str) – Named record from the meta data to retrieve.
rows (slice) – Rows of the record to extract.
- Returns:
meta_arr (np.ndarray) – Extracted array from the meta data record name.
- get_scale_factor(dset)
Get dataset scale factor
- Parameters:
dset (str) – Dataset to get scale factor for
- Returns:
float – Dataset scale factor, used to unscale int values to floats
- get_units(dset)
Get dataset units
- Parameters:
dset (str) – Dataset to get units for
- Returns:
str – Dataset units, None if not defined
- property global_attrs
Global (file) attributes
- Returns:
global_attrs (dict)
- property groups
Groups available
- Returns:
groups (list) – List of groups
- property h5
Open h5py File instance. If _group is not None return open Group
- Returns:
h5 (h5py.File | h5py.Group)
- property lat_lon
Extract (latitude, longitude) pairs
- Returns:
lat_lon (ndarray)
- property meta
Resource meta data DataFrame
- Returns:
meta (pandas.DataFrame)
- open_dataset(ds_name)
Open resource dataset
- Parameters:
ds_name (str) – Dataset name to open
- Returns:
ds (ResourceDataset) – Resource for open resource dataset
- classmethod preload_SAM(h5_file, sites, tech, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, time_index_step=None, means=False)
Pre-load project_points for SAM
- Parameters:
h5_file (str) – h5_file to extract resource from
sites (list) – List of sites to be provided to SAM (sites is synonymous with gids aka spatial indices)
tech (str) – Technology to be run by SAM
unscale (bool) – Boolean flag to automatically unscale variables on extraction
str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.
group (str) – Group within .h5 resource file to open
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
time_index_step (int, optional) – Step size for time_index, used to reduce temporal resolution, by default None
means (bool, optional) – Boolean flag to compute mean resource when res_array is set, by default False
- Returns:
SAM_res (SAMResource) – Instance of SAMResource pre-loaded with Solar resource for sites in project_points
- property res_dsets
Available resource datasets
- Returns:
list
- property resource_datasets
Available resource datasets
- Returns:
list
- property scale_factors
Dictionary of all dataset scale factors
- Returns:
scale_factors (dict)
- property shape
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
- Returns:
shape (tuple)
- property shapes
Dictionary of all dataset shapes
- Returns:
shapes (dict)
- property time_index
Resource DatetimeIndex
- Returns:
time_index (pandas.DatetimeIndex)
- property units
Dictionary of all dataset units
- Returns:
units (dict)