sup3r.bias.qdm.QuantileDeltaMappingCorrection#

class QuantileDeltaMappingCorrection(base_fps, bias_fps, bias_fut_fps, base_dset, bias_feature, distance_upper_bound=None, target=None, shape=None, base_handler='Resource', bias_handler='DataHandlerNCforCC', base_handler_kwargs=None, bias_handler_kwargs=None, bias_fut_handler_kwargs=None, decimals=None, match_zero_rate=False, n_quantiles=101, dist='empirical', relative=True, sampling='linear', log_base=10, n_time_steps=24, window_size=120, pre_load=True)[source]#

Bases: FillAndSmoothMixin, DataRetrievalBase

Estimate probability distributions required by Quantile Delta Mapping

The main purpose of this class is to estimate the probability distributions required by Quantile Delta Mapping (QDM) ([Cannon2015]) technique. Therefore, the name ‘Correction’ can be misleading since it is not the correction per se, but that was used to keep consistency within this module.

The QDM technique corrects bias and trend by comparing the data distributions of three datasets: a historical reference, a biased reference, and a biased target to correct (in Cannon et. al. (2015) called: historical observed, historical modeled, and future modeled respectively). Those three probability distributions provided here can be, for instance, used by local_qdm_bc() to actually correct a dataset.

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset (observed historical in Cannon et. al. (2015)). This is typically several years of WTK or NSRDB files.

  • bias_fps (list | str) – One or more biased .nc or .h5 filepaths representing the biased data to be compared with the baseline data (modeled historical in Cannon et. al. (2015)). This is typically several years of GCM .nc files.

  • bias_fut_fps (list | str) – Consistent data to bias_fps but for a different time period (modeled future in Cannon et. al. (2015)). This is the dataset that would be corrected, while bias_fsp is used to provide a transformation map with the baseline data.

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset

  • distance_upper_bound (float) – Upper bound on the nearest neighbor distance in decimal degrees. This should be the approximate resolution of the low-resolution bias data. None (default) will calculate this based on the median distance between points in bias_fps

  • target (tuple) – (lat, lon) lower left corner of raster to retrieve from bias_fps. If None then the lower left corner of the full domain will be used.

  • shape (tuple) – (rows, cols) grid size to retrieve from bias_fps. If None then the full domain shape will be used.

  • base_handler (str) – Name of rex resource handler or sup3r.preprocessing.data_handlers class to be retrieved from the rex/sup3r library. If a sup3r.preprocessing.data_handlers class is used, all data will be loaded in this class’ initialization and the subsequent bias calculation will be done in serial

  • bias_handler (str) – Name of the bias data handler class to be retrieved from the sup3r.preprocessing.data_handlers library.

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • bias_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class with the bias_fps

  • bias_fut_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class with the bias_fut_fps

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • match_zero_rate (bool) – Option to fix the frequency of zero values in the biased data. The lowest percentile of values in the biased data will be set to zero to match the percentile of zeros in the base data. If SkillAssessment is being run and this is True, the distributions will not be mean-centered. This helps resolve the issue where global climate models produce too many days with small precipitation totals e.g., the “drizzle problem” [Polade2014].

  • dist (str, default=”empirical”,) – Define the type of distribution, which can be “empirical” or any parametric distribution defined in “scipy”.

  • n_quantiles (int, default=101) – Defines the number of quantiles (between 0 and 1) for an empirical distribution.

  • sampling (str, default=”linear”,) – Defines how the quantiles are sampled. For instance, ‘linear’ will result in a linearly spaced quantiles. Other options are: ‘log’ and ‘invlog’.

  • log_base (int or float, default=10) – Log base value if sampling is “log” or “invlog”.

  • n_time_steps (int) – Number of times to calculate QDM parameters equally distributed along a year. For instance, n_time_steps=1 results in a single set of parameters while n_time_steps=12 is approximately every month.

  • window_size (int) – Total time window period in days to be considered for each time QDM is calculated. For instance, window_size=30 with n_time_steps=12 would result in approximately monthly estimates.

  • pre_load (bool) – Flag to preload all data needed for bias correction. This is currently recommended to improve performance with the new sup3r data handler access patterns

See also

sup3r.bias.bias_transforms.local_qdm_bc

Bias correction using QDM.

sup3r.preprocessing.data_handlers.DataHandler

Bias correction using QDM directly from a derived handler.

rex.utilities.bc_utils.QuantileDeltaMapping

Quantile Delta Mapping method and support functions. Since rex.utilities.bc_utils is used here, the arguments dist, n_quantiles, sampling, and log_base must be consitent with that package/module.

Notes

One way of using this class is by saving the distributions definitions obtained here with the method write_outputs() and then use that file with local_qdm_bc() or through a derived DataHandler. ATTENTION, be careful handling that file of parameters. There is no checking process and one could missuse the correction estimated for the wrong dataset.

References

[Cannon2015]

Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: how well do methods preserve changes in quantiles and extremes?. Journal of Climate, 28(17), 6938-6959.

Methods

compare_dists(base_data, bias_data[, adder, ...])

Compare two distributions using the two-sample Kolmogorov-Smirnov.

fill_and_smooth(out[, fill_extend, ...])

For a given set of parameters, fill and extend missing positions

get_base_data(base_fps, base_dset, base_gid, ...)

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

get_base_gid(bias_gid)

Get one or more base gid(s) corresponding to a bias gid.

get_bias_data(bias_gid[, bias_dh])

Get data from the biased data source for a single gid

get_bias_gid(coord)

Get the bias gid from a coordinate.

get_data_pair(coord[, daily_reduction])

Get base and bias data observations based on a single bias gid.

get_node_cmd(config)

Get a CLI call to call cls.run() on a single node based on an input config.

get_qdm_params(bias_data, bias_fut_data, ...)

Get quantiles' cut point for given datasets

pre_load()

Preload all data needed for bias correction.

run([fp_out, max_workers, daily_reduction, ...])

Estimate the statistical distributions for each location

window_mask(doy, d0, window_size)

An index of elements within a given time window

write_outputs(fp_out[, out])

Write outputs to an .h5 file.

Attributes

distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

meta

Get a meta data dictionary on how these bias factors were calculated

pre_load()[source]#

Preload all data needed for bias correction. This is currently recommended to improve performance with the new sup3r data handler access patterns

static get_qdm_params(bias_data, bias_fut_data, base_data, bias_feature, base_dset, sampling, n_samples, log_base)[source]#

Get quantiles’ cut point for given datasets

Estimate the quantiles’ cut points for each of the three given datasets. Lacking a good analytical approximation, such as one of the parametric distributions, those quantiles can be used to approximate the statistical distribution of those datasets.

Parameters:
  • bias_data (np.ndarray) – 1D array of biased data observations.

  • bias_fut_data (np.ndarray) – 1D array of biased data observations.

  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset.

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.

  • sampling (str) – Defines how the quantiles are sampled. For instance, ‘linear’ will result in a linearly spaced quantiles. Other options are: ‘log’ and ‘invlog’.

  • n_samples (int) – Number of points to sample between 0 and 1, i.e. number of quantiles.

  • log_base (int | float) – Log base value.

Returns:

out (dict) – Dictionary of the quantiles’ cut points. Note that to make sense of those cut point values, one need to know the given arguments such as log_base. For instance, the sequence [-1, 0, 2] are, if sampling was linear, the minimum, median, and maximum values respectively. The expected keys are “bias_{bias_feature}_params”, “bias_fut_{bias_feature}_params”, and “base_{base_dset}_params”.

See also

rex.utilities.bc_utils

Sampling scales, such as sample_q_linear()

write_outputs(fp_out, out=None)[source]#

Write outputs to an .h5 file.

Parameters:
  • fp_out (str | None) – An HDF5 filename to write the estimated statistical distributions.

  • out (dict, optional) – A dictionary with the three statistical distribution parameters. If not given, it uses out.

run(fp_out=None, max_workers=None, daily_reduction='avg', fill_extend=True, smooth_extend=0, smooth_interior=0)[source]#

Estimate the statistical distributions for each location

Parameters:
  • fp_out (str | None) – Optional .h5 output file to write scalar and adder arrays.

  • max_workers (int, optional) – Number of workers to run in parallel. 1 is serial and None is all available.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

Returns:

out (dict) – Dictionary with parameters defining the statistical distributions for each of the three given datasets. Each value has dimensions (lat, lon, n-parameters).

static compare_dists(base_data, bias_data, adder=0, scalar=1)#

Compare two distributions using the two-sample Kolmogorov-Smirnov. When the output is minimized, the two distributions are similar.

Parameters:
  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_data (np.ndarray) – 1D array of biased data observations.

  • adder (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

  • scalar (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

Returns:

out (float) – KS test statistic

property distance_upper_bound#

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

fill_and_smooth(out, fill_extend=True, smooth_extend=0, smooth_interior=0)#

For a given set of parameters, fill and extend missing positions

Fill data extending beyond the base meta data extent by doing a nearest neighbor gap fill. Smooth interior and extended region with given smoothing values. Interior smoothing can reduce the affect of extreme values within aggregations over large number of pixels. The interior is assumed to be defined by the region without nan values. The extended region is assumed to be the region with nan values.

Parameters:
  • out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

  • fill_extend (bool) – Whether to fill data extending beyond the base meta data with nearest neighbor values.

  • smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the threshold input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel

  • smooth_interior (float) – Value to use to smooth the scalar/adder data inside of the spatial domain set by the threshold input. This can reduce the effect of extreme values within aggregations over large number of pixels. This value is the standard deviation for the gaussian_filter kernel.

Returns:

out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

classmethod get_base_data(base_fps, base_dset, base_gid, base_handler, base_handler_kwargs=None, daily_reduction='avg', decimals=None, base_dh_inst=None)#

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.

  • base_dset (str) – A single dataset from the base_fps to retrieve.

  • base_gid (int | np.ndarray) – One or more spatial gids to retrieve from base_fps. The data will be spatially averaged across all of these sites.

  • base_handler (rex.Resource) – A rex data handler similar to rex.Resource or sup3r.DataHandler classes (if using the latter, must also input base_dh_inst)

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • base_dh_inst (sup3r.DataHandler) – Instantiated DataHandler class that has already loaded the base data (required if base files are .nc and are not being opened by a rex Resource handler).

Returns:

  • out_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • out_ti (pd.DatetimeIndex) – DatetimeIndex object of datetimes corresponding to the output data.

get_base_gid(bias_gid)#

Get one or more base gid(s) corresponding to a bias gid.

Parameters:

bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

Returns:

  • dist (np.ndarray) – Array of nearest neighbor distances with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • base_gid (np.ndarray) – Array of base gids that are the nearest neighbors of bias_gid with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

get_bias_data(bias_gid, bias_dh=None)#

Get data from the biased data source for a single gid

Parameters:
  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • bias_dh (DataHandler, default=self.bias_dh) – Any DataHandler from sup3r.preprocessing. This optional argument allows an alternative handler other than the usual bias_dh. For instance, the derived QuantileDeltaMappingCorrection uses it to access the reference biased dataset as well as the target biased dataset.

Returns:

bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

get_bias_gid(coord)#

Get the bias gid from a coordinate.

Parameters:

coord (tuple) – (lat, lon) to get data for.

Returns:

  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • d (float) – Distance in decimal degrees from coord to bias gid

get_data_pair(coord, daily_reduction='avg')#

Get base and bias data observations based on a single bias gid.

Parameters:
  • coord (tuple) – (lat, lon) to get data for.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

Returns:

  • base_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

  • base_dist (np.ndarray) – Array of nearest neighbor distances from coord to the base data sites with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • bias_dist (Float) – Nearest neighbor distance from coord to the bias data site

classmethod get_node_cmd(config)#

Get a CLI call to call cls.run() on a single node based on an input config.

Parameters:

config (dict) – sup3r bias calc config with all necessary args and kwargs to initialize the class and call run() on a single node.

property meta#

Get a meta data dictionary on how these bias factors were calculated

static window_mask(doy, d0, window_size)[source]#

An index of elements within a given time window

Create an index of days of the year within the target time window. It only considers the day of the year (doy), hence, it is limited to time scales smaller than annual.

Parameters:
  • doy (np.ndarray) – An unordered array of days of year, i.e. January 1st is 1.

  • d0 (int) – Center point of the target window [day of year].

  • window_size (float) – Total size of the target window, i.e. the window covers half this value on each side of d0. Note that it has the same units of doy, thus it is equal to the number of points only if doy is daily.

Returns:

np.array – An boolean array with the same shape of the given doy, where True means that position is within the target window.

Notes

Leap years have the day 366 included in the output index, but a precise shift is not calculated, resulting in a negligible error in large datasets.