sup3r.bias.qdm.PresRat

class PresRat(base_fps, bias_fps, bias_fut_fps, base_dset, bias_feature, distance_upper_bound=None, target=None, shape=None, base_handler='Resource', bias_handler='DataHandlerNCforCC', base_handler_kwargs=None, bias_handler_kwargs=None, decimals=None, match_zero_rate=False, n_quantiles=101, dist='empirical', relative=None, sampling='linear', log_base=10)[source]

Bases: ZeroRateMixin, QuantileDeltaMappingCorrection

PresRat bias correction method (precipitation)

The PresRat correction [Pierce2015] is defined as the combination of three steps: * Use the model-predicted change ratio (with the CDFs); * The treatment of zero-precipitation days (with the fraction of dry days); * The final correction factor (K) to preserve the mean (ratio between both

estimated means);

To keep consistency with the full sup3r pipeline, PresRat was implemented as follows:

  1. Define zero rate from observations (oh)

Using the historical observations, estimate the zero rate precipitation for each gridpoint. It is expected a long time series here, such as decadal or longer. A threshold larger than zero is an option here.

The result is a 2D (space) zero_rate (non-dimensional).

  1. Find the threshold for each gridpoint (mh)

Using the zero rate from the previous step, identify the magnitude threshold for each gridpoint that satisfies that dry days rate.

Note that Pierce (2015) impose tau >= 0.01 mm/day for precipitation.

The result is a 2D (space) threshold tau with the same dimensions of the data been corrected. For instance, it could be mm/day for precipitation.

  1. Define Z_fg using tau (mf)

The tau that was defined with the modeled historical, is now used as a threshold on modeled future before any correction to define the equivalent zero rate in the future.

The result is a 2D (space) rate (non-dimensional)

  1. Estimate tau_fut using Z_fg

Since sup3r process data in smaller chunks, it wouldn’t be possible to apply the rate Z_fg directly. To address that, all modeled future data is corrected with QDM, and applying Z_fg it is defined the tau_fut.

References

[Pierce2015] (1,2,3)

Pierce, D. W., Cayan, D. R., Maurer, E. P., Abatzoglou, J. T., & Hegewisch, K. C. (2015). Improved bias correction techniques for hydrological simulations of climate change. Journal of Hydrometeorology, 16(6), 2421-2442.

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset (observed historical in Cannon et. al. (2015)). This is typically several years of WTK or NSRDB files.

  • bias_fps (list | str) – One or more biased .nc or .h5 filepaths representing the biased data to be compared with the baseline data (modeled historical in Cannon et. al. (2015)). This is typically several years of GCM .nc files.

  • bias_fut_fps (list | str) – Consistent data to bias_fps but for a different time period (modeled future in Cannon et. al. (2015)). This is the dataset that would be corrected, while bias_fsp is used to provide a transformation map with the baseline data.

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset

  • distance_upper_bound (float) – Upper bound on the nearest neighbor distance in decimal degrees. This should be the approximate resolution of the low-resolution bias data. None (default) will calculate this based on the median distance between points in bias_fps

  • target (tuple) – (lat, lon) lower left corner of raster to retrieve from bias_fps. If None then the lower left corner of the full domain will be used.

  • shape (tuple) – (rows, cols) grid size to retrieve from bias_fps. If None then the full domain shape will be used.

  • base_handler (str) – Name of rex resource handler or sup3r.preprocessing.data_handling class to be retrieved from the rex/sup3r library. If a sup3r.preprocessing.data_handling class is used, all data will be loaded in this class’ initialization and the subsequent bias calculation will be done in serial

  • bias_handler (str) – Name of the bias data handler class to be retrieved from the sup3r.preprocessing.data_handling library.

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • bias_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • match_zero_rate (bool) – Option to fix the frequency of zero values in the biased data. The lowest percentile of values in the biased data will be set to zero to match the percentile of zeros in the base data. If SkillAssessment is being run and this is True, the distributions will not be mean-centered. This helps resolve the issue where global climate models produce too many days with small precipitation totals e.g., the “drizzle problem” [Polade2014].

  • dist (str, default=”empirical”,) – Define the type of distribution, which can be “empirical” or any parametric distribution defined in “scipy”.

  • n_quantiles (int, default=101) – Defines the number of quantiles (between 0 and 1) for an empirical distribution.

  • sampling (str, default=”linear”,) – Defines how the quantiles are sampled. For instance, ‘linear’ will result in a linearly spaced quantiles. Other options are: ‘log’ and ‘invlog’.

  • log_base (int or float, default=10) – Log base value if sampling is “log” or “invlog”.

NT

Number of times to calculate QDM parameters equally distributed along a year. For instance, NT=1 results in a single set of parameters while NT=12 is approximately every month.

Type:

int

WINDOW_SIZE

Total time window period to be considered for each time QDM is calculated. For instance, WINDOW_SIZE=30 with NT=12 would result in approximately monthly estimates.

Type:

int

See also

sup3r.bias.bias_transforms.local_qdm_bc

Bias correction using QDM.

sup3r.preprocessing.data_handling.DataHandler

Bias correction using QDM directly from a derived handler.

rex.utilities.bc_utils.QuantileDeltaMapping

Quantile Delta Mapping method and support functions. Since rex.utilities.bc_utils is used here, the arguments dist, n_quantiles, sampling, and log_base must be consitent with that package/module.

Notes

One way of using this class is by saving the distributions definitions obtained here with the method write_outputs() and then use that file with local_qdm_bc() or through a derived DataHandler. ATTENTION, be careful handling that file of parameters. There is no checking process and one could missuse the correction estimated for the wrong dataset.

References

[Cannon2015]

Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: how well do methods preserve changes in quantiles and extremes?. Journal of Climate, 28(17), 6938-6959.

Methods

compare_dists(base_data, bias_data[, adder, ...])

Compare two distributions using the two-sample Kolmogorov-Smirnov.

fill_and_smooth(out[, fill_extend, ...])

For a given set of parameters, fill and extend missing positions

get_base_data(base_fps, base_dset, base_gid, ...)

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

get_base_gid(bias_gid)

Get one or more base gid(s) corresponding to a bias gid.

get_bias_data(bias_gid[, bias_dh])

Get data from the biased data source for a single gid

get_bias_gid(coord)

Get the bias gid from a coordinate.

get_data_pair(coord[, daily_reduction])

Get base and bias data observations based on a single bias gid.

get_node_cmd(config)

Get a CLI call to call cls.run() on a single node based on an input config.

get_qdm_params(bias_data, bias_fut_data, ...)

Get quantiles' cut point for given datasets

run([fp_out, max_workers, daily_reduction, ...])

Estimate the required information for PresRat correction

window_mask(doy, d0, window_size)

An index of elements within a given time window

write_outputs(fp_out[, out, extra_attrs])

Write outputs to an .h5 file.

zero_precipitation_rate(arr[, threshold])

Rate of (nearly) zero precipitation days

Attributes

NT

Number of times to calculate QDM parameters in a year

WINDOW_SIZE

Window width in days

distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

meta

Get a meta data dictionary on how these bias factors were calculated

NT = 24

Number of times to calculate QDM parameters in a year

WINDOW_SIZE = 30

Window width in days

static compare_dists(base_data, bias_data, adder=0, scalar=1)

Compare two distributions using the two-sample Kolmogorov-Smirnov. When the output is minimized, the two distributions are similar.

Parameters:
  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_data (np.ndarray) – 1D array of biased data observations.

  • adder (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

  • scalar (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

Returns:

out (float) – KS test statistic

property distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

fill_and_smooth(out, fill_extend=True, smooth_extend=0, smooth_interior=0)

For a given set of parameters, fill and extend missing positions

Fill data extending beyond the base meta data extent by doing a nearest neighbor gap fill. Smooth interior and extended region with given smoothing values. Interior smoothing can reduce the affect of extreme values within aggregations over large number of pixels. The interior is assumed to be defined by the region without nan values. The extended region is assumed to be the region with nan values.

Parameters:
  • out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

  • fill_extend (bool) – Whether to fill data extending beyond the base meta data with nearest neighbor values.

  • smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the threshold input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel

  • smooth_interior (float) – Value to use to smooth the scalar/adder data inside of the spatial domain set by the threshold input. This can reduce the effect of extreme values within aggregations over large number of pixels. This value is the standard deviation for the gaussian_filter kernel.

Returns:

out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

classmethod get_base_data(base_fps, base_dset, base_gid, base_handler, base_handler_kwargs=None, daily_reduction='avg', decimals=None, base_dh_inst=None)

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.

  • base_dset (str) – A single dataset from the base_fps to retrieve.

  • base_gid (int | np.ndarray) – One or more spatial gids to retrieve from base_fps. The data will be spatially averaged across all of these sites.

  • base_handler (rex.Resource) – A rex data handler similar to rex.Resource or sup3r.DataHandler classes (if using the latter, must also input base_dh_inst)

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • base_dh_inst (sup3r.DataHandler) – Instantiated DataHandler class that has already loaded the base data (required if base files are .nc and are not being opened by a rex Resource handler).

Returns:

  • out_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • out_ti (pd.DatetimeIndex) – DatetimeIndex object of datetimes corresponding to the output data.

get_base_gid(bias_gid)

Get one or more base gid(s) corresponding to a bias gid.

Parameters:

bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

Returns:

  • dist (np.ndarray) – Array of nearest neighbor distances with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • base_gid (np.ndarray) – Array of base gids that are the nearest neighbors of bias_gid with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

get_bias_data(bias_gid, bias_dh=None)

Get data from the biased data source for a single gid

Parameters:
  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • bias_dh (DataHandler, default=self.bias_dh) – Any DataHandler from sup3r.preprocessing.data_handling. This optional argument allows an alternative handler other than the usual bias_dh. For instance, the derived QuantileDeltaMappingCorrection uses it to access the reference biased dataset as well as the target biased dataset.

Returns:

bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

get_bias_gid(coord)

Get the bias gid from a coordinate.

Parameters:

coord (tuple) – (lat, lon) to get data for.

Returns:

  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • d (float) – Distance in decimal degrees from coord to bias gid

get_data_pair(coord, daily_reduction='avg')

Get base and bias data observations based on a single bias gid.

Parameters:
  • coord (tuple) – (lat, lon) to get data for.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

Returns:

  • base_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

  • base_dist (np.ndarray) – Array of nearest neighbor distances from coord to the base data sites with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • bias_dist (Float) – Nearest neighbor distance from coord to the bias data site

classmethod get_node_cmd(config)

Get a CLI call to call cls.run() on a single node based on an input config.

Parameters:

config (dict) – sup3r bias calc config with all necessary args and kwargs to initialize the class and call run() on a single node.

static get_qdm_params(bias_data, bias_fut_data, base_data, bias_feature, base_dset, sampling, n_samples, log_base)

Get quantiles’ cut point for given datasets

Estimate the quantiles’ cut points for each of the three given datasets. Lacking a good analytical approximation, such as one of the parametric distributions, those quantiles can be used to approximate the statistical distribution of those datasets.

Parameters:
  • bias_data (np.ndarray) – 1D array of biased data observations.

  • bias_fut_data (np.ndarray) – 1D array of biased data observations.

  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset.

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.

  • sampling (str) – Defines how the quantiles are sampled. For instance, ‘linear’ will result in a linearly spaced quantiles. Other options are: ‘log’ and ‘invlog’.

  • n_samples (int) – Number of points to sample between 0 and 1, i.e. number of quantiles.

  • log_base (int | float) – Log base value.

Returns:

out (dict) – Dictionary of the quantiles’ cut points. Note that to make sense of those cut point values, one need to know the given arguments such as log_base. For instance, the sequence [-1, 0, 2] are, if sampling was linear, the minimum, median, and maximum values respectively. The expected keys are “bias_{bias_feature}_params”, “bias_fut_{bias_feature}_params”, and “base_{base_dset}_params”.

See also

rex.utilities.bc_utils

Sampling scales, such as sample_q_linear()

property meta

Get a meta data dictionary on how these bias factors were calculated

static window_mask(doy, d0, window_size)

An index of elements within a given time window

Create an index of days of the year within the target time window. It only considers the day of the year (doy), hence, it is limited to time scales smaller than annual.

Parameters:
  • doy (np.ndarray) – An unordered array of days of year, i.e. January 1st is 1.

  • d0 (int) – Center point of the target window [day of year].

  • window_size (float) – Total size of the target window, i.e. the window covers half this value on each side of d0. Note that it has the same units of doy, thus it is equal to the number of points only if doy is daily.

Returns:

np.array – An boolean array with the same shape of the given doy, where True means that position is within the target window.

Notes

Leap years have the day 366 included in the output index, but a precise shift is not calculated, resulting in a negligible error in large datasets.

static zero_precipitation_rate(arr: ndarray, threshold: float = 0.0)

Rate of (nearly) zero precipitation days

Estimate the rate of values less than a given threshold. In concept the threshold would be zero (thus the name zero precipitation rate) but it is often used a small threshold to truncate negligible values. For instance, [Pierce2015] uses 0.01 mm/day for PresRat correction.

Parameters:
  • arr (np.ndarray) – An array of values to be analyzed. Usually precipitation but it could be applied to other quantities.

  • threshold (float) – Minimum value accepted. Less than that is assumed to be zero.

Returns:

rate (float) – Rate of days with negligible precipitation (see Z_gf in [Pierce2015]).

Notes

The NaN are ignored for the rate estimate. Therefore, a large number of NaN might compromise the confidence of the estimator.

If the input values are all non-finite, it returns NaN.

Examples

>>> ZeroRateMixin().zero_precipitation_rate([2, 3, 4], 1)
0.0
>>> ZeroRateMixin().zero_precipitation_rate([0, 1, 2, 3], 1)
0.25
>>> ZeroRateMixin().zero_precipitation_rate([0, 1, 2, 3, np.nan], 1)
0.25
run(fp_out=None, max_workers=None, daily_reduction='avg', fill_extend=True, smooth_extend=0, smooth_interior=0, zero_rate_threshold=0.0)[source]

Estimate the required information for PresRat correction

Parameters:
  • fp_out (str | None) – Optional .h5 output file to write scalar and adder arrays.

  • max_workers (int, optional) – Number of workers to run in parallel. 1 is serial and None is all available.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

  • fill_extend (bool) – Whether to fill data extending beyond the base meta data with nearest neighbor values.

  • smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the threshold input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel

  • smooth_interior (float) – Value to use to smooth the scalar/adder data inside of the spatial domain set by the threshold input. This can reduce the effect of extreme values within aggregations over large number of pixels. This value is the standard deviation for the gaussian_filter kernel.

  • zero_rate_threshold (float, default=0.0) – Threshold value used to determine the zero rate in the observed historical dataset. For instance, 0.01 means that anything less than that will be considered negligible, hence equal to zero.

Returns:

out (dict) – Dictionary with parameters defining the statistical distributions for each of the three given datasets. Each value has dimensions (lat, lon, n-parameters).

write_outputs(fp_out: str, out: dict | None = None, extra_attrs: dict | None = None)[source]

Write outputs to an .h5 file.

Parameters:
  • fp_out (str | None) – An HDF5 filename to write the estimated statistical distributions.

  • out (dict, optional) – A dictionary with the three statistical distribution parameters. If not given, it uses out.

  • extra_attrs (dict, optional) – Extra attributes to be exported together with the dataset.

Examples

>>> mycalc = PresRat(...)
>>> mycalc.write_outputs(fp_out="myfile.h5", out=mydictdataset,
...   extra_attrs={'zero_rate_threshold': 0.01})