sup3r.bias.base.DataRetrievalBase#

class DataRetrievalBase(base_fps, bias_fps, base_dset, bias_feature, distance_upper_bound=None, target=None, shape=None, base_handler='Resource', bias_handler='DataHandlerNCforCC', base_handler_kwargs=None, bias_handler_kwargs=None, decimals=None, match_zero_rate=False, pre_load=True)[source]#

Bases: object

Base class to handle data retrieval for the biased data and the baseline data

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.

  • bias_fps (list | str) – One or more biased .nc or .h5 filepaths representing the biased data to be corrected based on the baseline data. This is typically several years of GCM .nc files.

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be u_100m or v_100m which will retrieve windspeed and winddirection and derive the U/V component.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset

  • distance_upper_bound (float) – Upper bound on the nearest neighbor distance in decimal degrees. This should be the approximate resolution of the low-resolution bias data. None (default) will calculate this based on the median distance between points in bias_fps

  • target (tuple) – (lat, lon) lower left corner of raster to retrieve from bias_fps. If None then the lower left corner of the full domain will be used.

  • shape (tuple) – (rows, cols) grid size to retrieve from bias_fps. If None then the full domain shape will be used.

  • base_handler (str) – Name of rex resource handler or sup3r.preprocessing class to be retrieved from the rex/sup3r library. If a sup3r.preprocessing class is used, all data will be loaded in this class’ initialization and the subsequent bias calculation will be done in serial

  • bias_handler (str) – Name of the bias data handler class to be retrieved from the sup3r.preprocessing library.

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • bias_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • match_zero_rate (bool) – Option to fix the frequency of zero values in the biased data. The lowest percentile of values in the biased data will be set to zero to match the percentile of zeros in the base data. If SkillAssessment is being run and this is True, the distributions will not be mean-centered. This helps resolve the issue where global climate models produce too many days with small precipitation totals e.g., the “drizzle problem” [Polade2014].

  • pre_load (bool) – Flag to preload all data needed for bias correction. This is currently recommended to improve performance with the new sup3r data handler access patterns

References

[Polade2014]

Polade, S. D., Pierce, D. W., Cayan, D. R., Gershunov, A., & Dettineer, M. D. (2014). The key role of dry days in changing regional climate and precipitation regimes. Scientific reports, 4(1), 4364. https://doi.org/10.1038/srep04364

Methods

compare_dists(base_data, bias_data[, adder, ...])

Compare two distributions using the two-sample Kolmogorov-Smirnov.

get_base_data(base_fps, base_dset, base_gid, ...)

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

get_base_gid(bias_gid)

Get one or more base gid(s) corresponding to a bias gid.

get_bias_data(bias_gid[, bias_dh])

Get data from the biased data source for a single gid

get_bias_gid(coord)

Get the bias gid from a coordinate.

get_data_pair(coord[, daily_reduction])

Get base and bias data observations based on a single bias gid.

get_node_cmd(config)

Get a CLI call to call cls.run() on a single node based on an input config.

pre_load()

Preload all data needed for bias correction.

Attributes

distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

meta

Get a meta data dictionary on how these bias factors were calculated

pre_load()[source]#

Preload all data needed for bias correction. This is currently recommended to improve performance with the new sup3r data handler access patterns

property meta#

Get a meta data dictionary on how these bias factors were calculated

property distance_upper_bound#

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

static compare_dists(base_data, bias_data, adder=0, scalar=1)[source]#

Compare two distributions using the two-sample Kolmogorov-Smirnov. When the output is minimized, the two distributions are similar.

Parameters:
  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_data (np.ndarray) – 1D array of biased data observations.

  • adder (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

  • scalar (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

Returns:

out (float) – KS test statistic

classmethod get_node_cmd(config)[source]#

Get a CLI call to call cls.run() on a single node based on an input config.

Parameters:

config (dict) – sup3r bias calc config with all necessary args and kwargs to initialize the class and call run() on a single node.

get_bias_gid(coord)[source]#

Get the bias gid from a coordinate.

Parameters:

coord (tuple) – (lat, lon) to get data for.

Returns:

  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • d (float) – Distance in decimal degrees from coord to bias gid

get_base_gid(bias_gid)[source]#

Get one or more base gid(s) corresponding to a bias gid.

Parameters:

bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

Returns:

  • dist (np.ndarray) – Array of nearest neighbor distances with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • base_gid (np.ndarray) – Array of base gids that are the nearest neighbors of bias_gid with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

get_data_pair(coord, daily_reduction='avg')[source]#

Get base and bias data observations based on a single bias gid.

Parameters:
  • coord (tuple) – (lat, lon) to get data for.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

Returns:

  • base_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

  • base_dist (np.ndarray) – Array of nearest neighbor distances from coord to the base data sites with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • bias_dist (Float) – Nearest neighbor distance from coord to the bias data site

get_bias_data(bias_gid, bias_dh=None)[source]#

Get data from the biased data source for a single gid

Parameters:
  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • bias_dh (DataHandler, default=self.bias_dh) – Any DataHandler from sup3r.preprocessing. This optional argument allows an alternative handler other than the usual bias_dh. For instance, the derived QuantileDeltaMappingCorrection uses it to access the reference biased dataset as well as the target biased dataset.

Returns:

bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

classmethod get_base_data(base_fps, base_dset, base_gid, base_handler, base_handler_kwargs=None, daily_reduction='avg', decimals=None, base_dh_inst=None)[source]#

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.

  • base_dset (str) – A single dataset from the base_fps to retrieve.

  • base_gid (int | np.ndarray) – One or more spatial gids to retrieve from base_fps. The data will be spatially averaged across all of these sites.

  • base_handler (rex.Resource) – A rex data handler similar to rex.Resource or sup3r.DataHandler classes (if using the latter, must also input base_dh_inst)

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • base_dh_inst (sup3r.DataHandler) – Instantiated DataHandler class that has already loaded the base data (required if base files are .nc and are not being opened by a rex Resource handler).

Returns:

  • out_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • out_ti (pd.DatetimeIndex) – DatetimeIndex object of datetimes corresponding to the output data.