rex.resource_extraction.resource_extraction.MultiTimeWaveX

class MultiTimeWaveX(resource_path, tree=None, unscale=True, str_decode=True, res_cls=None, hsds=False, hsds_kwargs=None)[source]

Bases: rex.resource_extraction.resource_extraction.MultiTimeResourceX

Wave resource extraction class for data stored temporaly accross multiple files

Parameters
  • resource_path (str) – Unix shell style pattern path with * wildcards to multi-file resource file sets. Files must have the same time index and coordinates but can have different datasets.

  • tree (str | cKDTree) – cKDTree or path to .pkl file containing pre-computed tree of lat, lon coordinates

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • res_cls (obj) – Resource handler to us to open individual .h5 files

  • hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False

  • hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None

Methods

box_gids(lat_lon_1, lat_lon_2)

Get gids within bounding lat_lon coordinates

close()

Close res_cls instance

get_SAM_gid(gid[, out_path, write_time])

Extract time-series of all variables needed to run SAM for nearest site to given resource gid

get_SAM_lat_lon(lat_lon[, check_lat_lon, ...])

Extract time-series of all variables needed to run SAM for nearest site to given lat_lon

get_box_df(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box and return as a DataFrame

get_box_ts(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box

get_gid_df(ds_name, gid)

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

get_gid_ts(ds_name, gid)

Extract timeseries of site(s) neareset to given lat_lon(s)

get_lat_lon_df(ds_name, lat_lon[, check_lat_lon])

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

get_lat_lon_ts(ds_name, lat_lon[, check_lat_lon])

Extract timeseries of site(s) neareset to given lat_lon(s)

get_region_df(ds_name, region[, region_col])

Extract timeseries of of all sites in given region and return as a DataFrame

get_region_ts(ds_name, region[, region_col])

Extract timeseries of of all sites in given region

get_timestep_map(ds_name, timestep[, ...])

Extract a map of the given dataset at the given timestep for the given region if supplied

lat_lon_gid(lat_lon[, check_lat_lon])

Get nearest gid to given (lat, lon) pair or pairs

make_SAM_files(res_h5, gids, out_path[, ...])

A performant parallel entry point for making many SAM csv files for many gids

region_gids(region[, region_col])

Get the gids for given region

save_region(out_fpath, region[, datasets, ...])

Extract desired datasets from desired region and save to a new out_fpath .h5 file

save_subset(out_fpath, gids[, datasets])

Extract desired datasets for given gids and save to a new out_fpath .h5 file

timestep_idx(timestep)

Get the index of the desired timestep

Attributes

attrs

Global (file) attributes

coordinates

(lat, lon) pairs

counties

Available Counties

countries

Available Countires

data_version

Get the version attribute of the data.

datasets

Datasets available

distance_threshold

Distance threshold, calculated as half of the diagonal between closest resource points, with an extra 5% margin

dsets

Datasets available

global_attrs

Global (file) attributes

groups

Groups available

h5

Open h5py File instance.

lat_lon

Extract (latitude, longitude) pairs

meta

Resource meta data DataFrame

res_dsets

Available resource datasets

resource

Open res_cls instance to access res_h5 data

resource_datasets

Available resource datasets

shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

states

Available states

time_index

Resource DatetimeIndex

tree

Pre-initialized cKDTree on the resource lat, lon coordinates

DEFAULT_RES_CLS

alias of rex.renewable_resource.WaveResource

property attrs

Global (file) attributes

Returns

attrs (dict)

box_gids(lat_lon_1, lat_lon_2)

Get gids within bounding lat_lon coordinates

Parameters
  • lat_lon_1 (list | tuple) – One corner of the bounding box

  • lat_lon_2 (list | tuple) – The other corner of the bounding box

Returns

gids (ndarray) – Gids in bounding box

close()

Close res_cls instance

property coordinates

(lat, lon) pairs

Returns

lat_lon (ndarray)

Type

Coordinates

property counties

Available Counties

Returns

counties (ndarray)

property countries

Available Countires

Returns

countries (ndarray)

property data_version

Get the version attribute of the data. None if not available.

Returns

version (str | None)

property datasets

Datasets available

Returns

list

property distance_threshold

Distance threshold, calculated as half of the diagonal between closest resource points, with an extra 5% margin

Returns

float

property dsets

Datasets available

Returns

list

get_SAM_gid(gid, out_path=None, write_time=True, **kwargs)

Extract time-series of all variables needed to run SAM for nearest site to given resource gid

Parameters
  • gid (int | list) – Resource gid(s) of interset

  • out_path (str, optional) – Path to save SAM data to in SAM .csv format, by default None

  • write_time (bool) – Flag to write the time columns (Year, Month, Day, Hour, Minute)

  • kwargs (dict) – Internal kwargs for get_SAM_df

Returns

SAM_df (pandas.DataFrame | list) – Time-series DataFrame for given site and dataset If multiple lat, lon pairs are given a list of DatFrames is returned

get_SAM_lat_lon(lat_lon, check_lat_lon=True, out_path=None, **kwargs)

Extract time-series of all variables needed to run SAM for nearest site to given lat_lon

Parameters
  • lat_lon (tuple) – (lat, lon) coordinate of interest

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

  • out_path (str, optional) – Path to save SAM data to in SAM .csv format, by default None

  • kwargs (dict) – Internal kwargs for get_SAM_df

Returns

SAM_df (pandas.DataFrame | list) – Time-series DataFrame for given site and dataset If multiple lat, lon pairs are given a list of DatFrames is returned

get_box_df(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box and return as a DataFrame

Parameters
  • ds_name (str) – Dataset to extract

  • lat_lon_1 (list | tuple) – One corner of the bounding box

  • lat_lon_2 (list | tuple) – The other corner of the bounding box

Returns

box_df (pandas.DataFrame) – Time-series array of desired dataset for all sites in desired bounding box

get_box_ts(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box

Parameters
  • ds_name (str) – Dataset to extract

  • lat_lon_1 (list | tuple) – One corner of the bounding box

  • lat_lon_2 (list | tuple) – The other corner of the bounding box

Returns

box_ts (ndarray) – Time-series array of desired dataset for all sites in desired bounding box

get_gid_df(ds_name, gid)

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

Parameters
  • ds_name (str) – Dataset to extract

  • gid (int | list) – Resource gid(s) of interset

Returns

df (pandas.DataFrame) – Time-series DataFrame for given site(s) and dataset

get_gid_ts(ds_name, gid)

Extract timeseries of site(s) neareset to given lat_lon(s)

Parameters
  • ds_name (str) – Dataset to extract

  • gid (int | list) – Resource gid(s) of interset

Returns

ts (ndarray) – Time-series for given site(s) and dataset

get_lat_lon_df(ds_name, lat_lon, check_lat_lon=True)

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

Parameters
  • ds_name (str) – Dataset to extract

  • lat_lon (tuple) – (lat, lon) coordinate of interest

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

Returns

df (pandas.DataFrame) – Time-series DataFrame for given site(s) and dataset

get_lat_lon_ts(ds_name, lat_lon, check_lat_lon=True)

Extract timeseries of site(s) neareset to given lat_lon(s)

Parameters
  • ds_name (str) – Dataset to extract

  • lat_lon (tuple | list) – (lat, lon) coordinate of interest or pairs of coordinates

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

Returns

ts (ndarray) – Time-series for given site(s) and dataset

get_region_df(ds_name, region, region_col='state')

Extract timeseries of of all sites in given region and return as a DataFrame

Parameters
  • ds_name (str) – Dataset to extract

  • region (str) – Region to extract all pixels for

  • region_col (str) – Region column to search

Returns

region_df (pandas.DataFrame) – Time-series array of desired dataset for all sites in desired region

get_region_ts(ds_name, region, region_col='state')

Extract timeseries of of all sites in given region

Parameters
  • ds_name (str) – Dataset to extract

  • region (str) – Region to search for

  • region_col (str) – Region column to search

Returns

region_ts (ndarray) – Time-series array of desired dataset for all sites in desired region

get_timestep_map(ds_name, timestep, region=None, region_col='state', box=None)

Extract a map of the given dataset at the given timestep for the given region if supplied

Parameters
  • ds_name (str) – Dataset to extract

  • timestep (str) – Timestep of interest

  • region (str, optional) – Region to extract all pixels for, by default None

  • region_col (str, optional) – Region column to search, by default ‘state’

  • box (tuple, optional) – Bounding corners of box to extract pixels for

Returns

ts_map (pandas.DataFrame) – DataFrame of map values

property global_attrs

Global (file) attributes

Returns

global_attrs (dict)

property groups

Groups available

Returns

groups (list)

property h5

Open h5py File instance. If _group is not None return open Group

Returns

h5 (h5py.File | h5py.Group)

property lat_lon

Extract (latitude, longitude) pairs

Returns

lat_lon (ndarray)

lat_lon_gid(lat_lon, check_lat_lon=True)

Get nearest gid to given (lat, lon) pair or pairs

Parameters
  • lat_lon (ndarray) – Either a single (lat, lon) pair or series of (lat, lon) pairs

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

Returns

gids (int | ndarray) – Nearest gid(s) to given (lat, lon) pair(s)

classmethod make_SAM_files(res_h5, gids, out_path, write_time=True, max_workers=1, n_chunks=36, **kwargs)

A performant parallel entry point for making many SAM csv files for many gids

Parameters
  • res_h5 (str) – Filepath to resource h5 file.

  • gids (list | tuple | np.ndarray) – Resource gid(s) of interset

  • out_path (str, optional) – Path to save SAM data to in SAM .csv format. A gid index “*_{gid}.csv” will be appended to the file path

  • write_time (bool) – Flag to write the time columns (Year, Month, Day, Hour, Minute)

  • max_workers (int | None) – Number of parallel workers. None for all workers.

  • n_chunks (int) – Number of chunks to split gids into for parallelization

  • kwargs (dict) – Internal kwargs for get_SAM_df

property meta

Resource meta data DataFrame

Returns

meta (pandas.DataFrame)

region_gids(region, region_col='state')

Get the gids for given region

Parameters
  • region (str) – Region to search for

  • region_col (str) – Region column to search

Returns

gids (ndarray) – Vector of gids in given region

property res_dsets

Available resource datasets

Returns

list

property resource

Open res_cls instance to access res_h5 data

Returns

res_cls (rex.resource.Resource | rex.renewable_resource.*)

property resource_datasets

Available resource datasets

Returns

list

save_region(out_fpath, region, datasets=None, region_col='state')

Extract desired datasets from desired region and save to a new out_fpath .h5 file

Parameters
  • out_fpath (str) – Path to .h5 file to save region datasets to

  • region (str, optional) – Region to extract all pixels for, by default None

  • datasets (str | list, optional) – Dataset(s) to extract from given region and save to out_fpath, if None extract all datasets, by default None

  • region_col (str, optional) – Region column to search, by default ‘state’

save_subset(out_fpath, gids, datasets=None)

Extract desired datasets for given gids and save to a new out_fpath .h5 file

Parameters
  • out_fpath (str) – Path to .h5 file to save region datasets to

  • gids (list) – List of gids to extract data from and save to .h5

  • datasets (str | list, optional) – Dataset(s) to extract from given region and save to out_fpath, if None extract all datasets, by default None

property shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

Returns

shape (tuple)

property states

Available states

Returns

states (ndarray)

property time_index

Resource DatetimeIndex

Returns

time_index (pandas.DatetimeIndex)

timestep_idx(timestep)

Get the index of the desired timestep

Parameters

timestep (str) – Timestep of interest

Returns

ts_idx (int) – Time index value

property tree

Pre-initialized cKDTree on the resource lat, lon coordinates

Returns

tree (cKDTree)