rex.resource_extraction.resource_extraction.NSRDBX
- class NSRDBX(res_h5, res_cls=None, tree=None, unscale=True, str_decode=True, group=None, hsds=False, hsds_kwargs=None, log_vers=True)[source]
Bases:
ResourceX
NSRDB extraction class
- Parameters:
res_h5 (str) – Path to resource .h5 file of interest
res_cls (obj, optional) – Resource class to use to open and access resource data, by default Resource (default changes for subclasses like NSRDBX)
tree (str | cKDTree, optional) – cKDTree or path to .pkl file containing pre-computed tree of lat, lon coordinates, by default None
unscale (bool, optional) – Boolean flag to automatically unscale variables on extraction, by default True
str_decode (bool, optional) – Boolean flag to decode the bytestring meta data into normal strings. Setting this to False will speed up the meta data read. by default True
group (str, optional) – Group within .h5 resource file to open, by default None
hsds (bool, optional) – Boolean flag to use h5pyd to handle .h5 ‘files’ hosted on AWS behind HSDS, by default False
hsds_kwargs (dict, optional) – Dictionary of optional kwargs for h5pyd, e.g., bucket, username, password, by default None
log_vers (bool) – Flag to log rex versions, True by default. Disable this if wrapping in a parallel process (logs get very verbose).
Methods
box_gids
(lat_lon_1, lat_lon_2)Get gids within bounding lat_lon coordinates
close
()Close res_cls instance
get_SAM_gid
(gid[, out_path, write_time, ...])Extract time-series of all variables needed to run SAM for nearest site to given resource gid
get_SAM_lat_lon
(lat_lon[, check_lat_lon, ...])Extract time-series of all variables needed to run SAM for nearest site to given lat_lon
get_box_df
(ds_name, lat_lon_1, lat_lon_2)Extract timeseries of of all sites in given bounding box and return as a DataFrame
get_box_ts
(ds_name, lat_lon_1, lat_lon_2)Extract timeseries of of all sites in given bounding box
get_gid_df
(ds_name, gid)Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame
get_gid_ts
(ds_name, gid)Extract timeseries of site(s) neareset to given lat_lon(s)
get_grid_vectors
(target[, meta])Get vectors representing pure horizontal/vertical movements in the meta data coordinate system.
get_lat_lon_df
(ds_name, lat_lon[, check_lat_lon])Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame
get_lat_lon_ts
(ds_name, lat_lon[, check_lat_lon])Extract timeseries of site(s) neareset to given lat_lon(s)
get_raster_index
(target, shape[, meta, ...])Get meta data index values that correspond to a 2D rectangular grid of the requested shape starting with the target coordinate in the bottom left hand corner.
get_region_df
(ds_name, region[, region_col])Extract timeseries of of all sites in given region and return as a DataFrame
get_region_ts
(ds_name, region[, region_col])Extract timeseries of of all sites in given region
get_timestep_map
(ds_name, timestep[, ...])Extract a map of the given dataset at the given timestep for the given region if supplied
lat_lon_gid
(lat_lon[, check_lat_lon])Get nearest gid to given (lat, lon) pair or pairs
make_SAM_files
(res_h5, gids, out_path[, ...])A performant parallel entry point for making many SAM csv files for many gids
region_gids
(region[, region_col])Get the gids for given region
save_region
(out_fpath, region[, datasets, ...])Extract desired datasets from desired region and save to a new out_fpath .h5 file
save_subset
(out_fpath, gids[, datasets])Extract desired datasets for given gids and save to a new out_fpath .h5 file
timestep_idx
(timestep)Get the index of the desired timestep
Attributes
Global (file) attributes
(lat, lon) pairs
Available Counties
Available Countires
Get the version attribute of the data.
Datasets available
Distance threshold, calculated as half of the diagonal between closest resource points, with an extra 5% margin
Datasets available
Global (file) attributes
Groups available
Open h5py File instance.
Extract (latitude, longitude) pairs
Resource meta data DataFrame
Available resource datasets
Open res_cls instance to access res_h5 data
Available resource datasets
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
Available states
Resource DatetimeIndex
Pre-initialized cKDTree on the resource lat, lon coordinates
- property attrs
Global (file) attributes
- Returns:
attrs (dict)
- box_gids(lat_lon_1, lat_lon_2)
Get gids within bounding lat_lon coordinates
- Parameters:
lat_lon_1 (list | tuple) – One corner of the bounding box
lat_lon_2 (list | tuple) – The other corner of the bounding box
- Returns:
gids (ndarray) – Gids in bounding box
- close()
Close res_cls instance
- property coordinates
(lat, lon) pairs
- Returns:
lat_lon (ndarray)
- Type:
Coordinates
- property counties
Available Counties
- Returns:
counties (ndarray)
- property countries
Available Countires
- Returns:
countries (ndarray)
- property data_version
Get the version attribute of the data. None if not available.
- Returns:
version (str | None)
- property datasets
Datasets available
- Returns:
list
- property distance_threshold
Distance threshold, calculated as half of the diagonal between closest resource points, with an extra 5% margin
- Returns:
float
- property dsets
Datasets available
- Returns:
list
- get_SAM_gid(gid, out_path=None, write_time=True, extra_meta_data=None, **kwargs)
Extract time-series of all variables needed to run SAM for nearest site to given resource gid
- Parameters:
gid (int | list) – Resource gid(s) of interset
out_path (str, optional) – Path to save SAM data to in SAM .csv format, by default None
write_time (bool) – Flag to write the time columns (Year, Month, Day, Hour, Minute)
extra_meta_data (dict, optional) – Dictionary that maps the names and values of extra meta info. For example, extra_meta_data={‘TMY Year’: ‘2020’} will add a column ‘TMY Year’ to the meta data with a value of ‘2020’.
kwargs (dict) – Internal kwargs for get_SAM_df
- Returns:
SAM_df (pandas.DataFrame | list) – Time-series DataFrame for given site and dataset If multiple lat, lon pairs are given a list of DatFrames is returned
- get_SAM_lat_lon(lat_lon, check_lat_lon=True, out_path=None, **kwargs)
Extract time-series of all variables needed to run SAM for nearest site to given lat_lon
- Parameters:
lat_lon (tuple) – (lat, lon) coordinate of interest
check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True
out_path (str, optional) – Path to save SAM data to in SAM .csv format, by default None
kwargs (dict) – Internal kwargs for get_SAM_df
- Returns:
SAM_df (pandas.DataFrame | list) – Time-series DataFrame for given site and dataset If multiple lat, lon pairs are given a list of DatFrames is returned
- get_box_df(ds_name, lat_lon_1, lat_lon_2)
Extract timeseries of of all sites in given bounding box and return as a DataFrame
- Parameters:
ds_name (str) – Dataset to extract
lat_lon_1 (list | tuple) – One corner of the bounding box
lat_lon_2 (list | tuple) – The other corner of the bounding box
- Returns:
box_df (pandas.DataFrame) – Time-series array of desired dataset for all sites in desired bounding box
- get_box_ts(ds_name, lat_lon_1, lat_lon_2)
Extract timeseries of of all sites in given bounding box
- Parameters:
ds_name (str) – Dataset to extract
lat_lon_1 (list | tuple) – One corner of the bounding box
lat_lon_2 (list | tuple) – The other corner of the bounding box
- Returns:
box_ts (ndarray) – Time-series array of desired dataset for all sites in desired bounding box
- get_gid_df(ds_name, gid)
Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame
- Parameters:
ds_name (str) – Dataset to extract
gid (int | list) – Resource gid(s) of interset
- Returns:
df (pandas.DataFrame) – Time-series DataFrame for given site(s) and dataset
- get_gid_ts(ds_name, gid)
Extract timeseries of site(s) neareset to given lat_lon(s)
- Parameters:
ds_name (str) – Dataset to extract
gid (int | list) – Resource gid(s) of interset
- Returns:
ts (ndarray) – Time-series for given site(s) and dataset
- get_grid_vectors(target, meta=None)
Get vectors representing pure horizontal/vertical movements in the meta data coordinate system. Note that this can break down if a target is requested outside of the main grid area.
- Parameters:
target (tuple) – Starting coordinate (latitude, longitude) in decimal degrees for the bottom left hand corner of the raster grid.
meta (pd.DataFrame | None) – Optional meta data input with latitude, longitude fields. Default is None which extracts self.meta from the resource data.
- Returns:
gid_target (np.ndarray) – 1D array of shape (2,) with (latitude, longitude) corresponding to the meta data grid cell closest to the requested target.
vector_x (np.ndarray) – 1D array of shape (2,) with (delta_latitude, delta_longitude) corresponding to the vector for pure positive horizontal movement in the meta data
vector_y (np.ndarray) – 1D array of shape (2,) with (delta_latitude, delta_longitude) corresponding to the vector for pure positive vertical movement in the meta data
close (np.ndarray) – Meta data index values corresponding to the 3x3 box of pixels closest to gid_target.
- get_lat_lon_df(ds_name, lat_lon, check_lat_lon=True)
Extract timeseries of site(s) nearest to given lat_lon(s) and return as a DataFrame
- Parameters:
ds_name (str) – Dataset to extract
lat_lon (tuple) – (lat, lon) coordinate of interest
check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True
- Returns:
df (pandas.DataFrame) – Time-series DataFrame for given site(s) and dataset
- get_lat_lon_ts(ds_name, lat_lon, check_lat_lon=True)
Extract timeseries of site(s) neareset to given lat_lon(s)
- Parameters:
ds_name (str) – Dataset to extract
lat_lon (tuple | list) – (lat, lon) coordinate of interest or pairs of coordinates
check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True
- Returns:
ts (ndarray) – Time-series for given site(s) and dataset
- get_raster_index(target, shape, meta=None, max_delta=50)
Get meta data index values that correspond to a 2D rectangular grid of the requested shape starting with the target coordinate in the bottom left hand corner. Note that this can break down if a target is requested outside of the main grid area.
- Parameters:
target (tuple) – Starting coordinate (latitude, longitude) in decimal degrees for the bottom left hand corner of the raster grid.
shape (tuple) – Desired raster shape in format (number_rows, number_cols)
meta (pd.DataFrame | None) – Optional meta data input with latitude, longitude fields. Default is None which extracts self.meta from the resource data.
max_delta (int) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raseter will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances.
- Returns:
raster_index (np.ndarray) – 2D array of meta data index values that form a 2D rectangular grid with latitudes descending from top to bottom and longitudes ascending from left to right.
- get_region_df(ds_name, region, region_col='state')
Extract timeseries of of all sites in given region and return as a DataFrame
- Parameters:
ds_name (str) – Dataset to extract
region (str) – Region to extract all pixels for
region_col (str) – Region column to search
- Returns:
region_df (pandas.DataFrame) – Time-series array of desired dataset for all sites in desired region
- get_region_ts(ds_name, region, region_col='state')
Extract timeseries of of all sites in given region
- Parameters:
ds_name (str) – Dataset to extract
region (str) – Region to search for
region_col (str) – Region column to search
- Returns:
region_ts (ndarray) – Time-series array of desired dataset for all sites in desired region
- get_timestep_map(ds_name, timestep, region=None, region_col='state', box=None)
Extract a map of the given dataset at the given timestep for the given region if supplied
- Parameters:
ds_name (str) – Dataset to extract
timestep (str) – Timestep of interest
region (str, optional) – Region to extract all pixels for, by default None
region_col (str, optional) – Region column to search, by default ‘state’
box (tuple, optional) – Bounding corners of box to extract pixels for
- Returns:
ts_map (pandas.DataFrame) – DataFrame of map values
- property global_attrs
Global (file) attributes
- Returns:
global_attrs (dict)
- property groups
Groups available
- Returns:
groups (list)
- property h5
Open h5py File instance. If _group is not None return open Group
- Returns:
h5 (h5py.File | h5py.Group)
- property lat_lon
Extract (latitude, longitude) pairs
- Returns:
lat_lon (ndarray)
- lat_lon_gid(lat_lon, check_lat_lon=True)
Get nearest gid to given (lat, lon) pair or pairs
- Parameters:
lat_lon (ndarray) – Either a single (lat, lon) pair or series of (lat, lon) pairs
check_lat_lon (bool, optional) – Flag to check to make sure the requested lat lons are inside the resource grid. This is done by comparing with the bounding box of the resource coordinates and by ensuring the nearest neighbor distance are below the distance threshold to ensure that requested lat, lon coordinates are within the resource grid, by default True
- Returns:
gids (int | ndarray) – Nearest gid(s) to given (lat, lon) pair(s)
- classmethod make_SAM_files(res_h5, gids, out_path, write_time=True, extra_meta_data=None, max_workers=1, n_chunks=36, **kwargs)
A performant parallel entry point for making many SAM csv files for many gids
- Parameters:
res_h5 (str) – Filepath to resource h5 file.
gids (list | tuple | np.ndarray) – Resource gid(s) of interset
out_path (str, optional) – Path to save SAM data to in SAM .csv format. A gid index “*_{gid}.csv” will be appended to the file path
write_time (bool) – Flag to write the time columns (Year, Month, Day, Hour, Minute)
extra_meta_data (dict, optional) – Dictionary that maps the names and values of extra meta info. For example, extra_meta_data={‘TMY Year’: ‘2020’} will add a column ‘TMY Year’ to the meta data with a value of ‘2020’.
max_workers (int | None) – Number of parallel workers. None for all workers.
n_chunks (int) – Number of chunks to split gids into for parallelization
kwargs (dict) – Internal kwargs for get_SAM_df
- property meta
Resource meta data DataFrame
- Returns:
meta (pandas.DataFrame)
- region_gids(region, region_col='state')
Get the gids for given region
- Parameters:
region (str) – Region to search for
region_col (str) – Region column to search
- Returns:
gids (ndarray) – Vector of gids in given region
- property res_dsets
Available resource datasets
- Returns:
list
- property resource
Open res_cls instance to access res_h5 data
- Returns:
res_cls (rex.resource.Resource | rex.renewable_resource.*)
- property resource_datasets
Available resource datasets
- Returns:
list
- save_region(out_fpath, region, datasets=None, region_col='state')
Extract desired datasets from desired region and save to a new out_fpath .h5 file
- Parameters:
out_fpath (str) – Path to .h5 file to save region datasets to
region (str, optional) – Region to extract all pixels for, by default None
datasets (str | list, optional) – Dataset(s) to extract from given region and save to out_fpath, if None extract all datasets, by default None
region_col (str, optional) – Region column to search, by default ‘state’
- save_subset(out_fpath, gids, datasets=None)
Extract desired datasets for given gids and save to a new out_fpath .h5 file
- Parameters:
out_fpath (str) – Path to .h5 file to save region datasets to
gids (list) – List of gids to extract data from and save to .h5
datasets (str | list, optional) – Dataset(s) to extract from given region and save to out_fpath, if None extract all datasets, by default None
- property shape
Resource shape (timesteps, sites) shape = (len(time_index), len(meta))
- Returns:
shape (tuple)
- property states
Available states
- Returns:
states (ndarray)
- property time_index
Resource DatetimeIndex
- Returns:
time_index (pandas.DatetimeIndex)
- timestep_idx(timestep)
Get the index of the desired timestep
- Parameters:
timestep (str) – Timestep of interest
- Returns:
ts_idx (int) – Time index value
- property tree
Pre-initialized cKDTree on the resource lat, lon coordinates
- Returns:
tree (cKDTree)