sup3r.bias.presrat.PresRat#
- class PresRat(base_fps, bias_fps, bias_fut_fps, base_dset, bias_feature, distance_upper_bound=None, target=None, shape=None, base_handler='Resource', bias_handler='DataHandlerNCforCC', base_handler_kwargs=None, bias_handler_kwargs=None, bias_fut_handler_kwargs=None, decimals=None, match_zero_rate=False, n_quantiles=101, dist='empirical', relative=True, sampling='linear', log_base=10, n_time_steps=24, window_size=120, pre_load=True)[source]#
Bases:
ZeroRateMixin
,QuantileDeltaMappingCorrection
PresRat bias correction method (precipitation)
The PresRat correction [Pierce2015] is defined as the combination of three steps: * Use the model-predicted change ratio (with the CDFs); * The treatment of zero-precipitation days (with the fraction of dry days); * The final correction factor (K) to preserve the mean (ratio between both estimated means);
To keep consistency with the full sup3r pipeline, PresRat was implemented as follows:
Define zero rate from observations (oh)
Using the historical observations, estimate the zero rate precipitation for each gridpoint. It is expected a long time series here, such as decadal or longer. A threshold larger than zero is an option here.
The result is a 2D (space) zero_rate (non-dimensional).
Find the threshold for each gridpoint (mh)
Using the zero rate from the previous step, identify the magnitude threshold for each gridpoint that satisfies that dry days rate.
Note that Pierce (2015) impose tau >= 0.01 mm/day for precipitation.
The result is a 2D (space) threshold tau with the same dimensions of the data been corrected. For instance, it could be mm/day for precipitation.
Define Z_fg using tau (mf)
The tau that was defined with the modeled historical, is now used as a threshold on modeled future before any correction to define the equivalent zero rate in the future.
The result is a 2D (space) rate (non-dimensional)
Estimate tau_fut using Z_fg
Since sup3r process data in smaller chunks, it wouldn’t be possible to apply the rate Z_fg directly. To address that, all modeled future data is corrected with QDM, and applying Z_fg it is defined the tau_fut.
References
- Parameters:
base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset (observed historical in Cannon et. al. (2015)). This is typically several years of WTK or NSRDB files.
bias_fps (list | str) – One or more biased .nc or .h5 filepaths representing the biased data to be compared with the baseline data (modeled historical in Cannon et. al. (2015)). This is typically several years of GCM .nc files.
bias_fut_fps (list | str) – Consistent data to bias_fps but for a different time period (modeled future in Cannon et. al. (2015)). This is the dataset that would be corrected, while bias_fsp is used to provide a transformation map with the baseline data.
base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.
bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset
distance_upper_bound (float) – Upper bound on the nearest neighbor distance in decimal degrees. This should be the approximate resolution of the low-resolution bias data. None (default) will calculate this based on the median distance between points in bias_fps
target (tuple) – (lat, lon) lower left corner of raster to retrieve from bias_fps. If None then the lower left corner of the full domain will be used.
shape (tuple) – (rows, cols) grid size to retrieve from bias_fps. If None then the full domain shape will be used.
base_handler (str) – Name of rex resource handler or sup3r.preprocessing.data_handlers class to be retrieved from the rex/sup3r library. If a sup3r.preprocessing.data_handlers class is used, all data will be loaded in this class’ initialization and the subsequent bias calculation will be done in serial
bias_handler (str) – Name of the bias data handler class to be retrieved from the sup3r.preprocessing.data_handlers library.
base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class
bias_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class with the bias_fps
bias_fut_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class with the bias_fut_fps
decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.
match_zero_rate (bool) – Option to fix the frequency of zero values in the biased data. The lowest percentile of values in the biased data will be set to zero to match the percentile of zeros in the base data. If SkillAssessment is being run and this is True, the distributions will not be mean-centered. This helps resolve the issue where global climate models produce too many days with small precipitation totals e.g., the “drizzle problem” [Polade2014].
dist (str, default=”empirical”,) – Define the type of distribution, which can be “empirical” or any parametric distribution defined in “scipy”.
n_quantiles (int, default=101) – Defines the number of quantiles (between 0 and 1) for an empirical distribution.
sampling (str, default=”linear”,) – Defines how the quantiles are sampled. For instance, ‘linear’ will result in a linearly spaced quantiles. Other options are: ‘log’ and ‘invlog’.
log_base (int or float, default=10) – Log base value if sampling is “log” or “invlog”.
n_time_steps (int) – Number of times to calculate QDM parameters equally distributed along a year. For instance, n_time_steps=1 results in a single set of parameters while n_time_steps=12 is approximately every month.
window_size (int) – Total time window period in days to be considered for each time QDM is calculated. For instance, window_size=30 with n_time_steps=12 would result in approximately monthly estimates.
pre_load (bool) – Flag to preload all data needed for bias correction. This is currently recommended to improve performance with the new sup3r data handler access patterns
See also
sup3r.bias.bias_transforms.local_qdm_bc
Bias correction using QDM.
sup3r.preprocessing.data_handlers.DataHandler
Bias correction using QDM directly from a derived handler.
rex.utilities.bc_utils.QuantileDeltaMapping
Quantile Delta Mapping method and support functions. Since
rex.utilities.bc_utils
is used here, the argumentsdist
,n_quantiles
,sampling
, andlog_base
must be consitent with that package/module.
Notes
One way of using this class is by saving the distributions definitions obtained here with the method
write_outputs()
and then use that file withlocal_qdm_bc()
or through a derivedDataHandler
. ATTENTION, be careful handling that file of parameters. There is no checking process and one could missuse the correction estimated for the wrong dataset.References
[Cannon2015]Cannon, A. J., Sobie, S. R., & Murdock, T. Q. (2015). Bias correction of GCM precipitation by quantile mapping: how well do methods preserve changes in quantiles and extremes?. Journal of Climate, 28(17), 6938-6959.
Methods
calc_k_factor
(base_data, bias_data, ...)Calculate the K factor at a single spatial location that will preserve the original model-predicted mean change in precipitation
calc_tau_fut
(base_data, bias_data, ...[, ...])Calculate a precipitation threshold (tau) that preserves the model-predicted changes in fraction of dry days at a single spatial location.
compare_dists
(base_data, bias_data[, adder, ...])Compare two distributions using the two-sample Kolmogorov-Smirnov.
fill_and_smooth
(out[, fill_extend, ...])For a given set of parameters, fill and extend missing positions
get_base_data
(base_fps, base_dset, base_gid, ...)Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.
get_base_gid
(bias_gid)Get one or more base gid(s) corresponding to a bias gid.
get_bias_data
(bias_gid[, bias_dh])Get data from the biased data source for a single gid
get_bias_gid
(coord)Get the bias gid from a coordinate.
get_data_pair
(coord[, daily_reduction])Get base and bias data observations based on a single bias gid.
get_node_cmd
(config)Get a CLI call to call cls.run() on a single node based on an input config.
get_qdm_params
(bias_data, bias_fut_data, ...)Get quantiles' cut point for given datasets
pre_load
()Preload all data needed for bias correction.
run
([fp_out, max_workers, daily_reduction, ...])Estimate the required information for PresRat correction
window_mask
(doy, d0, window_size)An index of elements within a given time window
write_outputs
(fp_out[, out, extra_attrs])Write outputs to an .h5 file.
zero_precipitation_rate
(arr[, threshold])Rate of (nearly) zero precipitation days
Attributes
Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.
Get a meta data dictionary on how these bias factors were calculated
- classmethod calc_tau_fut(base_data, bias_data, bias_fut_data, corrected_fut_data, zero_rate_threshold=1.157e-07)[source]#
Calculate a precipitation threshold (tau) that preserves the model-predicted changes in fraction of dry days at a single spatial location.
- Parameters:
base_data (np.ndarray) – A 1D array of (usually) daily precipitation observations from a historical dataset like Daymet
bias_data (np.ndarray) – A 1D array of (usually) daily precipitation historical climate simulation outputs from a GCM dataset from CMIP6
bias_future_data (np.ndarray) – A 1D array of (usually) daily precipitation future (e.g., ssp245) climate simulation outputs from a GCM dataset from CMIP6
corrected_fut_data (np.ndarray) – Bias corrected bias_future_data usually with relative QDM
zero_rate_threshold (float, default=1.157e-7) – Threshold value used to determine the zero rate in the observed historical dataset. For instance, 0.01 means that anything less than that will be considered negligible, hence equal to zero. Dai 2006 defined this as 1mm/day. Pierce 2015 used 0.01mm/day. We recommend 0.01mm/day (1.157e-7 kg/m2/s).
- Returns:
tau_fut (float) – Precipitation threshold that will preserve the model predicted changes in fraction of dry days. Precipitation less than this value in the modeled future data can be set to zero.
obs_zero_rate (float) – Rate of dry days in the observed historical data.
- classmethod calc_k_factor(base_data, bias_data, bias_fut_data, corrected_fut_data, base_ti, bias_ti, bias_fut_ti, window_center, window_size, n_time_steps, zero_rate_threshold)[source]#
Calculate the K factor at a single spatial location that will preserve the original model-predicted mean change in precipitation
- Parameters:
base_data (np.ndarray) – A 1D array of (usually) daily precipitation observations from a historical dataset like Daymet
bias_data (np.ndarray) – A 1D array of (usually) daily precipitation historical climate simulation outputs from a GCM dataset from CMIP6
bias_future_data (np.ndarray) – A 1D array of (usually) daily precipitation future (e.g., ssp245) climate simulation outputs from a GCM dataset from CMIP6
corrected_fut_data (np.ndarray) – Bias corrected bias_future_data usually with relative QDM
base_ti (pd.DatetimeIndex) – Datetime index associated with bias_data and of the same length
bias_fut_ti (pd.DatetimeIndex) – Datetime index associated with bias_fut_data and of the same length
window_center (np.ndarray) – Sequence of days of the year equally spaced and shifted by half window size, thus `ntimes`=12 results in approximately [15, 45, …]. It includes the fraction of a day, thus 15.5 is equivalent to January 15th, 12:00h. Shape is (N,)
window_size (int) – Total time window period in days to be considered for each time QDM is calculated. For instance, window_size=30 with n_time_steps=12 would result in approximately monthly estimates.
n_time_steps (int) – Number of times to calculate QDM parameters equally distributed along a year. For instance, n_time_steps=1 results in a single set of parameters while n_time_steps=12 is approximately every month.
zero_rate_threshold (float, default=1.157e-7) – Threshold value used to determine the zero rate in the observed historical dataset. For instance, 0.01 means that anything less than that will be considered negligible, hence equal to zero. Dai 2006 defined this as 1mm/day. Pierce 2015 used 0.01mm/day. We recommend 0.01mm/day (1.157e-7 kg/m2/s).
- Returns:
k (np.ndarray) – K factor from the Pierce 2015 paper with shape (n_time_steps,) for a single spatial location.
- run(fp_out=None, max_workers=None, daily_reduction='avg', fill_extend=True, smooth_extend=0, smooth_interior=0, zero_rate_threshold=1.157e-07)[source]#
Estimate the required information for PresRat correction
- Parameters:
fp_out (str | None) – Optional .h5 output file to write scalar and adder arrays.
max_workers (int, optional) – Number of workers to run in parallel. 1 is serial and None is all available.
daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)
fill_extend (bool) – Whether to fill data extending beyond the base meta data with nearest neighbor values.
smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the threshold input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel
smooth_interior (float) – Value to use to smooth the scalar/adder data inside of the spatial domain set by the threshold input. This can reduce the effect of extreme values within aggregations over large number of pixels. This value is the standard deviation for the gaussian_filter kernel.
zero_rate_threshold (float, default=1.157e-7) – Threshold value used to determine the zero rate in the observed historical dataset. For instance, 0.01 means that anything less than that will be considered negligible, hence equal to zero. Dai 2006 defined this as 1mm/day. Pierce 2015 used 0.01mm/day. We recommend 0.01mm/day (1.157e-7 kg/m2/s).
- Returns:
out (dict) – Dictionary with parameters defining the statistical distributions for each of the three given datasets. Each value has dimensions (lat, lon, n-parameters).
- write_outputs(fp_out: str, out: dict | None = None, extra_attrs: dict | None = None)[source]#
Write outputs to an .h5 file.
- Parameters:
fp_out (str | None) – An HDF5 filename to write the estimated statistical distributions.
out (dict, optional) – A dictionary with the three statistical distribution parameters. If not given, it uses
out
.extra_attrs (dict, optional) – Extra attributes to be exported together with the dataset.
Examples
>>> mycalc = PresRat(...) >>> mycalc.write_outputs(fp_out="myfile.h5", out=mydictdataset, ... extra_attrs={'zero_rate_threshold': 0.01})
- static compare_dists(base_data, bias_data, adder=0, scalar=1)#
Compare two distributions using the two-sample Kolmogorov-Smirnov. When the output is minimized, the two distributions are similar.
- Parameters:
base_data (np.ndarray) – 1D array of base data observations.
bias_data (np.ndarray) – 1D array of biased data observations.
adder (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder
scalar (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder
- Returns:
out (float) – KS test statistic
- property distance_upper_bound#
Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.
- fill_and_smooth(out, fill_extend=True, smooth_extend=0, smooth_interior=0)#
For a given set of parameters, fill and extend missing positions
Fill data extending beyond the base meta data extent by doing a nearest neighbor gap fill. Smooth interior and extended region with given smoothing values. Interior smoothing can reduce the affect of extreme values within aggregations over large number of pixels. The interior is assumed to be defined by the region without nan values. The extended region is assumed to be the region with nan values.
- Parameters:
out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).
fill_extend (bool) – Whether to fill data extending beyond the base meta data with nearest neighbor values.
smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the threshold input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel
smooth_interior (float) – Value to use to smooth the scalar/adder data inside of the spatial domain set by the threshold input. This can reduce the effect of extreme values within aggregations over large number of pixels. This value is the standard deviation for the gaussian_filter kernel.
- Returns:
out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).
- classmethod get_base_data(base_fps, base_dset, base_gid, base_handler, base_handler_kwargs=None, daily_reduction='avg', decimals=None, base_dh_inst=None)#
Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.
- Parameters:
base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.
base_dset (str) – A single dataset from the base_fps to retrieve.
base_gid (int | np.ndarray) – One or more spatial gids to retrieve from base_fps. The data will be spatially averaged across all of these sites.
base_handler (rex.Resource) – A rex data handler similar to rex.Resource or sup3r.DataHandler classes (if using the latter, must also input base_dh_inst)
base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class
daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)
decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.
base_dh_inst (sup3r.DataHandler) – Instantiated DataHandler class that has already loaded the base data (required if base files are .nc and are not being opened by a rex Resource handler).
- Returns:
out_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.
out_ti (pd.DatetimeIndex) – DatetimeIndex object of datetimes corresponding to the output data.
- get_base_gid(bias_gid)#
Get one or more base gid(s) corresponding to a bias gid.
- Parameters:
bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.
- Returns:
dist (np.ndarray) – Array of nearest neighbor distances with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.
base_gid (np.ndarray) – Array of base gids that are the nearest neighbors of bias_gid with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.
- get_bias_data(bias_gid, bias_dh=None)#
Get data from the biased data source for a single gid
- Parameters:
bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.
bias_dh (DataHandler, default=self.bias_dh) – Any
DataHandler
fromsup3r.preprocessing
. This optional argument allows an alternative handler other than the usualbias_dh
. For instance, the derivedQuantileDeltaMappingCorrection
uses it to access the reference biased dataset as well as the target biased dataset.
- Returns:
bias_data (np.ndarray) – 1D array of temporal data at the requested gid.
- get_bias_gid(coord)#
Get the bias gid from a coordinate.
- Parameters:
coord (tuple) – (lat, lon) to get data for.
- Returns:
bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.
d (float) – Distance in decimal degrees from coord to bias gid
- get_data_pair(coord, daily_reduction='avg')#
Get base and bias data observations based on a single bias gid.
- Parameters:
coord (tuple) – (lat, lon) to get data for.
daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)
- Returns:
base_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.
bias_data (np.ndarray) – 1D array of temporal data at the requested gid.
base_dist (np.ndarray) – Array of nearest neighbor distances from coord to the base data sites with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.
bias_dist (Float) – Nearest neighbor distance from coord to the bias data site
- classmethod get_node_cmd(config)#
Get a CLI call to call cls.run() on a single node based on an input config.
- Parameters:
config (dict) – sup3r bias calc config with all necessary args and kwargs to initialize the class and call run() on a single node.
- static get_qdm_params(bias_data, bias_fut_data, base_data, bias_feature, base_dset, sampling, n_samples, log_base)#
Get quantiles’ cut point for given datasets
Estimate the quantiles’ cut points for each of the three given datasets. Lacking a good analytical approximation, such as one of the parametric distributions, those quantiles can be used to approximate the statistical distribution of those datasets.
- Parameters:
bias_data (np.ndarray) – 1D array of biased data observations.
bias_fut_data (np.ndarray) – 1D array of biased data observations.
base_data (np.ndarray) – 1D array of base data observations.
bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset.
base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.
sampling (str) – Defines how the quantiles are sampled. For instance, ‘linear’ will result in a linearly spaced quantiles. Other options are: ‘log’ and ‘invlog’.
n_samples (int) – Number of points to sample between 0 and 1, i.e. number of quantiles.
log_base (int | float) – Log base value.
- Returns:
out (dict) – Dictionary of the quantiles’ cut points. Note that to make sense of those cut point values, one need to know the given arguments such as log_base. For instance, the sequence [-1, 0, 2] are, if sampling was linear, the minimum, median, and maximum values respectively. The expected keys are “bias_{bias_feature}_params”, “bias_fut_{bias_feature}_params”, and “base_{base_dset}_params”.
See also
rex.utilities.bc_utils
Sampling scales, such as sample_q_linear()
- property meta#
Get a meta data dictionary on how these bias factors were calculated
- pre_load()#
Preload all data needed for bias correction. This is currently recommended to improve performance with the new sup3r data handler access patterns
- static window_mask(doy, d0, window_size)#
An index of elements within a given time window
Create an index of days of the year within the target time window. It only considers the day of the year (doy), hence, it is limited to time scales smaller than annual.
- Parameters:
doy (np.ndarray) – An unordered array of days of year, i.e. January 1st is 1.
d0 (int) – Center point of the target window [day of year].
window_size (float) – Total size of the target window, i.e. the window covers half this value on each side of d0. Note that it has the same units of doy, thus it is equal to the number of points only if doy is daily.
- Returns:
np.array – An boolean array with the same shape of the given doy, where True means that position is within the target window.
Notes
Leap years have the day 366 included in the output index, but a precise shift is not calculated, resulting in a negligible error in large datasets.
- static zero_precipitation_rate(arr: ndarray, threshold: float = 0.0)#
Rate of (nearly) zero precipitation days
Estimate the rate of values less than a given
threshold
. In concept the threshold would be zero (thus the name zero precipitation rate) but it is often used a small threshold to truncate negligible values. For instance, [Pierce2015] uses 0.01 mm/day for PresRat correction.- Parameters:
arr (np.ndarray) – An array of values to be analyzed. Usually precipitation but it could be applied to other quantities.
threshold (float) – Minimum value accepted. Less than that is assumed to be zero.
- Returns:
rate (float) – Rate of days with negligible precipitation (see Z_gf in [Pierce2015]).
Notes
The
NaN
are ignored for the rate estimate. Therefore, a large number ofNaN
might compromise the confidence of the estimator.If the input values are all non-finite, it returns NaN.
Examples
>>> ZeroRateMixin().zero_precipitation_rate([2, 3, 4], 1) 0.0
>>> ZeroRateMixin().zero_precipitation_rate([0, 1, 2, 3], 1) 0.25
>>> ZeroRateMixin().zero_precipitation_rate([0, 1, 2, 3, np.nan], 1) 0.25