rex.resource_extraction.resource_extraction.MultiFileNSRDBX

class MultiFileNSRDBX(resource_path, res_cls=None, tree=None, unscale=True, str_decode=True, check_files=False)[source]

Bases: MultiFileResourceX

Multi-File NSRDB extraction class

Parameters:
  • resource_path (str) – Unix shell style pattern path with * wildcards to multi-file resource file sets. Files must have the same time index and coordinates but can have different datasets.

  • res_cls (obj) – Resource class to use to open and access resource data

  • tree (str | cKDTree) – cKDTree or path to .pkl file containing pre-computed tree of lat, lon coordinates

  • unscale (bool) – Boolean flag to automatically unscale variables on extraction

  • str_decode (bool) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read.

  • check_files (bool) – Check to ensure files have the same coordinates and time_index

Methods

box_gids(lat_lon_1, lat_lon_2)

Get gids within bounding lat_lon coordinates

close()

Close res_cls instance

get_SAM_gid(gid[, out_path, write_time, ...])

Extract time-series of all variables needed to run SAM for nearest site to given resource gid

get_SAM_lat_lon(lat_lon[, check_lat_lon, ...])

Extract time-series of all variables needed to run SAM for nearest site to given lat_lon

get_box_df(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box and return as a DataFrame

get_box_ts(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box

get_gid_df(ds_name, gid)

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

get_gid_ts(ds_name, gid)

Extract timeseries of site(s) neareset to given lat_lon(s)

get_grid_vectors(target[, meta])

Get vectors representing pure horizontal/vertical movements in the meta data coordinate system.

get_lat_lon_df(ds_name, lat_lon[, check_lat_lon])

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

get_lat_lon_ts(ds_name, lat_lon[, check_lat_lon])

Extract timeseries of site(s) neareset to given lat_lon(s)

get_raster_index(target, shape[, meta, ...])

Get meta data index values that correspond to a 2D rectangular grid of the requested shape starting with the target coordinate in the bottom left hand corner.

get_region_df(ds_name, region[, region_col])

Extract timeseries of of all sites in given region and return as a DataFrame

get_region_ts(ds_name, region[, region_col])

Extract timeseries of of all sites in given region

get_timestep_map(ds_name, timestep[, ...])

Extract a map of the given dataset at the given timestep for the given region if supplied

lat_lon_gid(lat_lon[, check_lat_lon])

Get nearest gid to given (lat, lon) pair or pairs

make_SAM_files(res_h5, gids, out_path[, ...])

A performant parallel entry point for making many SAM csv files for many gids

region_gids(region[, region_col])

Get the gids for given region

save_region(out_fpath, region[, datasets, ...])

Extract desired datasets from desired region and save to a new out_fpath .h5 file

save_subset(out_fpath, gids[, datasets])

Extract desired datasets for given gids and save to a new out_fpath .h5 file

timestep_idx(timestep)

Get the index of the desired timestep

Attributes

attrs

Global (file) attributes

coordinates

(lat, lon) pairs

counties

Available Counties

countries

Available Countires

data_version

Get the version attribute of the data.

datasets

Datasets available

distance_threshold

Distance threshold, calculated as half of the diagonal between closest resource points, with an extra 5% margin

dsets

Datasets available

global_attrs

Global (file) attributes

groups

Groups available

h5

Open h5py File instance.

lat_lon

Extract (latitude, longitude) pairs

meta

Resource meta data DataFrame

res_dsets

Available resource datasets

resource

Open res_cls instance to access res_h5 data

resource_datasets

Available resource datasets

shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

states

Available states

time_index

Resource DatetimeIndex

tree

Pre-initialized cKDTree on the resource lat, lon coordinates

DEFAULT_RES_CLS

alias of MultiFileNSRDB

property attrs

Global (file) attributes

Returns:

attrs (dict)

box_gids(lat_lon_1, lat_lon_2)

Get gids within bounding lat_lon coordinates

Parameters:
  • lat_lon_1 (list | tuple) – One corner of the bounding box

  • lat_lon_2 (list | tuple) – The other corner of the bounding box

Returns:

gids (ndarray) – Gids in bounding box

close()

Close res_cls instance

property coordinates

(lat, lon) pairs

Returns:

lat_lon (ndarray)

Type:

Coordinates

property counties

Available Counties

Returns:

counties (ndarray)

property countries

Available Countires

Returns:

countries (ndarray)

property data_version

Get the version attribute of the data. None if not available.

Returns:

version (str | None)

property datasets

Datasets available

Returns:

list

property distance_threshold

Distance threshold, calculated as half of the diagonal between closest resource points, with an extra 5% margin

Returns:

float

property dsets

Datasets available

Returns:

list

get_SAM_gid(gid, out_path=None, write_time=True, extra_meta_data=None, **kwargs)

Extract time-series of all variables needed to run SAM for nearest site to given resource gid

Parameters:
  • gid (int | list) – Resource gid(s) of interset

  • out_path (str, optional) – Path to save SAM data to in SAM .csv format, by default None

  • write_time (bool) – Flag to write the time columns (Year, Month, Day, Hour, Minute)

  • extra_meta_data (dict, optional) – Dictionary that maps the names and values of extra meta info. For example, extra_meta_data={‘TMY Year’: ‘2020’} will add a column ‘TMY Year’ to the meta data with a value of ‘2020’.

  • kwargs (dict) – Internal kwargs for get_SAM_df

Returns:

SAM_df (pandas.DataFrame | list) – Time-series DataFrame for given site and dataset If multiple lat, lon pairs are given a list of DatFrames is returned

get_SAM_lat_lon(lat_lon, check_lat_lon=True, out_path=None, **kwargs)

Extract time-series of all variables needed to run SAM for nearest site to given lat_lon

Parameters:
  • lat_lon (tuple) – (lat, lon) coordinate of interest

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

  • out_path (str, optional) – Path to save SAM data to in SAM .csv format, by default None

  • kwargs (dict) – Internal kwargs for get_SAM_df

Returns:

SAM_df (pandas.DataFrame | list) – Time-series DataFrame for given site and dataset If multiple lat, lon pairs are given a list of DatFrames is returned

get_box_df(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box and return as a DataFrame

Parameters:
  • ds_name (str) – Dataset to extract

  • lat_lon_1 (list | tuple) – One corner of the bounding box

  • lat_lon_2 (list | tuple) – The other corner of the bounding box

Returns:

box_df (pandas.DataFrame) – Time-series array of desired dataset for all sites in desired bounding box

get_box_ts(ds_name, lat_lon_1, lat_lon_2)

Extract timeseries of of all sites in given bounding box

Parameters:
  • ds_name (str) – Dataset to extract

  • lat_lon_1 (list | tuple) – One corner of the bounding box

  • lat_lon_2 (list | tuple) – The other corner of the bounding box

Returns:

box_ts (ndarray) – Time-series array of desired dataset for all sites in desired bounding box

get_gid_df(ds_name, gid)

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

Parameters:
  • ds_name (str) – Dataset to extract

  • gid (int | list) – Resource gid(s) of interset

Returns:

df (pandas.DataFrame) – Time-series DataFrame for given site(s) and dataset

get_gid_ts(ds_name, gid)

Extract timeseries of site(s) neareset to given lat_lon(s)

Parameters:
  • ds_name (str) – Dataset to extract

  • gid (int | list) – Resource gid(s) of interset

Returns:

ts (ndarray) – Time-series for given site(s) and dataset

get_grid_vectors(target, meta=None)

Get vectors representing pure horizontal/vertical movements in the meta data coordinate system. Note that this can break down if a target is requested outside of the main grid area.

Parameters:
  • target (tuple) – Starting coordinate (latitude, longitude) in decimal degrees for the bottom left hand corner of the raster grid.

  • meta (pd.DataFrame | None) – Optional meta data input with latitude, longitude fields. Default is None which extracts self.meta from the resource data.

Returns:

  • gid_target (np.ndarray) – 1D array of shape (2,) with (latitude, longitude) corresponding to the meta data grid cell closest to the requested target.

  • vector_x (np.ndarray) – 1D array of shape (2,) with (delta_latitude, delta_longitude) corresponding to the vector for pure positive horizontal movement in the meta data

  • vector_y (np.ndarray) – 1D array of shape (2,) with (delta_latitude, delta_longitude) corresponding to the vector for pure positive vertical movement in the meta data

  • close (np.ndarray) – Meta data index values corresponding to the 3x3 box of pixels closest to gid_target.

get_lat_lon_df(ds_name, lat_lon, check_lat_lon=True)

Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame

Parameters:
  • ds_name (str) – Dataset to extract

  • lat_lon (tuple) – (lat, lon) coordinate of interest

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

Returns:

df (pandas.DataFrame) – Time-series DataFrame for given site(s) and dataset

get_lat_lon_ts(ds_name, lat_lon, check_lat_lon=True)

Extract timeseries of site(s) neareset to given lat_lon(s)

Parameters:
  • ds_name (str) – Dataset to extract

  • lat_lon (tuple | list) – (lat, lon) coordinate of interest or pairs of coordinates

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

Returns:

ts (ndarray) – Time-series for given site(s) and dataset

get_raster_index(target, shape, meta=None, max_delta=50)

Get meta data index values that correspond to a 2D rectangular grid of the requested shape starting with the target coordinate in the bottom left hand corner. Note that this can break down if a target is requested outside of the main grid area.

Parameters:
  • target (tuple) – Starting coordinate (latitude, longitude) in decimal degrees for the bottom left hand corner of the raster grid.

  • shape (tuple) – Desired raster shape in format (number_rows, number_cols)

  • meta (pd.DataFrame | None) – Optional meta data input with latitude, longitude fields. Default is None which extracts self.meta from the resource data.

  • max_delta (int) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raseter will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances.

Returns:

raster_index (np.ndarray) – 2D array of meta data index values that form a 2D rectangular grid with latitudes descending from top to bottom and longitudes ascending from left to right.

get_region_df(ds_name, region, region_col='state')

Extract timeseries of of all sites in given region and return as a DataFrame

Parameters:
  • ds_name (str) – Dataset to extract

  • region (str) – Region to extract all pixels for

  • region_col (str) – Region column to search

Returns:

region_df (pandas.DataFrame) – Time-series array of desired dataset for all sites in desired region

get_region_ts(ds_name, region, region_col='state')

Extract timeseries of of all sites in given region

Parameters:
  • ds_name (str) – Dataset to extract

  • region (str) – Region to search for

  • region_col (str) – Region column to search

Returns:

region_ts (ndarray) – Time-series array of desired dataset for all sites in desired region

get_timestep_map(ds_name, timestep, region=None, region_col='state', box=None)

Extract a map of the given dataset at the given timestep for the given region if supplied

Parameters:
  • ds_name (str) – Dataset to extract

  • timestep (str) – Timestep of interest

  • region (str, optional) – Region to extract all pixels for, by default None

  • region_col (str, optional) – Region column to search, by default ‘state’

  • box (tuple, optional) – Bounding corners of box to extract pixels for

Returns:

ts_map (pandas.DataFrame) – DataFrame of map values

property global_attrs

Global (file) attributes

Returns:

global_attrs (dict)

property groups

Groups available

Returns:

groups (list)

property h5

Open h5py File instance. If _group is not None return open Group

Returns:

h5 (h5py.File | h5py.Group)

property lat_lon

Extract (latitude, longitude) pairs

Returns:

lat_lon (ndarray)

lat_lon_gid(lat_lon, check_lat_lon=True)

Get nearest gid to given (lat, lon) pair or pairs

Parameters:
  • lat_lon (ndarray) – Either a single (lat, lon) pair or series of (lat, lon) pairs

  • check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True

Returns:

gids (int | ndarray) – Nearest gid(s) to given (lat, lon) pair(s)

classmethod make_SAM_files(res_h5, gids, out_path, write_time=True, extra_meta_data=None, max_workers=1, n_chunks=36, **kwargs)

A performant parallel entry point for making many SAM csv files for many gids

Parameters:
  • res_h5 (str) – Filepath to resource h5 file.

  • gids (list | tuple | np.ndarray) – Resource gid(s) of interset

  • out_path (str, optional) – Path to save SAM data to in SAM .csv format. A gid index “*_{gid}.csv” will be appended to the file path

  • write_time (bool) – Flag to write the time columns (Year, Month, Day, Hour, Minute)

  • extra_meta_data (dict, optional) – Dictionary that maps the names and values of extra meta info. For example, extra_meta_data={‘TMY Year’: ‘2020’} will add a column ‘TMY Year’ to the meta data with a value of ‘2020’.

  • max_workers (int | None) – Number of parallel workers. None for all workers.

  • n_chunks (int) – Number of chunks to split gids into for parallelization

  • kwargs (dict) – Internal kwargs for get_SAM_df

property meta

Resource meta data DataFrame

Returns:

meta (pandas.DataFrame)

region_gids(region, region_col='state')

Get the gids for given region

Parameters:
  • region (str) – Region to search for

  • region_col (str) – Region column to search

Returns:

gids (ndarray) – Vector of gids in given region

property res_dsets

Available resource datasets

Returns:

list

property resource

Open res_cls instance to access res_h5 data

Returns:

res_cls (rex.resource.Resource | rex.renewable_resource.*)

property resource_datasets

Available resource datasets

Returns:

list

save_region(out_fpath, region, datasets=None, region_col='state')

Extract desired datasets from desired region and save to a new out_fpath .h5 file

Parameters:
  • out_fpath (str) – Path to .h5 file to save region datasets to

  • region (str, optional) – Region to extract all pixels for, by default None

  • datasets (str | list, optional) – Dataset(s) to extract from given region and save to out_fpath, if None extract all datasets, by default None

  • region_col (str, optional) – Region column to search, by default ‘state’

save_subset(out_fpath, gids, datasets=None)

Extract desired datasets for given gids and save to a new out_fpath .h5 file

Parameters:
  • out_fpath (str) – Path to .h5 file to save region datasets to

  • gids (list) – List of gids to extract data from and save to .h5

  • datasets (str | list, optional) – Dataset(s) to extract from given region and save to out_fpath, if None extract all datasets, by default None

property shape

Resource shape (timesteps, sites) shape = (len(time_index), len(meta))

Returns:

shape (tuple)

property states

Available states

Returns:

states (ndarray)

property time_index

Resource DatetimeIndex

Returns:

time_index (pandas.DatetimeIndex)

timestep_idx(timestep)

Get the index of the desired timestep

Parameters:

timestep (str) – Timestep of interest

Returns:

ts_idx (int) – Time index value

property tree

Pre-initialized cKDTree on the resource lat, lon coordinates

Returns:

tree (cKDTree)