sup3r.bias.bias_calc.LinearCorrection

class LinearCorrection(base_fps, bias_fps, base_dset, bias_feature, distance_upper_bound=None, target=None, shape=None, base_handler='Resource', bias_handler='DataHandlerNCforCC', base_handler_kwargs=None, bias_handler_kwargs=None, decimals=None, match_zero_rate=False)[source]

Bases: FillAndSmoothMixin, DataRetrievalBase

Calculate linear correction *scalar +adder factors to bias correct data

This calculation operates on single bias sites for the full time series of available data (no season bias correction)

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.

  • bias_fps (list | str) – One or more biased .nc or .h5 filepaths representing the biased data to be corrected based on the baseline data. This is typically several years of GCM .nc files.

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset

  • distance_upper_bound (float) – Upper bound on the nearest neighbor distance in decimal degrees. This should be the approximate resolution of the low-resolution bias data. None (default) will calculate this based on the median distance between points in bias_fps

  • target (tuple) – (lat, lon) lower left corner of raster to retrieve from bias_fps. If None then the lower left corner of the full domain will be used.

  • shape (tuple) – (rows, cols) grid size to retrieve from bias_fps. If None then the full domain shape will be used.

  • base_handler (str) – Name of rex resource handler or sup3r.preprocessing.data_handling class to be retrieved from the rex/sup3r library. If a sup3r.preprocessing.data_handling class is used, all data will be loaded in this class’ initialization and the subsequent bias calculation will be done in serial

  • bias_handler (str) – Name of the bias data handler class to be retrieved from the sup3r.preprocessing.data_handling library.

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • bias_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the bias_handler class

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • match_zero_rate (bool) – Option to fix the frequency of zero values in the biased data. The lowest percentile of values in the biased data will be set to zero to match the percentile of zeros in the base data. If SkillAssessment is being run and this is True, the distributions will not be mean-centered. This helps resolve the issue where global climate models produce too many days with small precipitation totals e.g., the “drizzle problem” [Polade2014].

References

[Polade2014]

Polade, S. D., Pierce, D. W., Cayan, D. R., Gershunov, A., & Dettineer, M. D. (2014). The key role of dry days in changing regional climate and precipitation regimes. Scientific reports, 4(1), 4364. https://doi.org/10.1038/srep04364

Methods

compare_dists(base_data, bias_data[, adder, ...])

Compare two distributions using the two-sample Kolmogorov-Smirnov.

fill_and_smooth(out[, fill_extend, ...])

For a given set of parameters, fill and extend missing positions

get_base_data(base_fps, base_dset, base_gid, ...)

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

get_base_gid(bias_gid)

Get one or more base gid(s) corresponding to a bias gid.

get_bias_data(bias_gid[, bias_dh])

Get data from the biased data source for a single gid

get_bias_gid(coord)

Get the bias gid from a coordinate.

get_data_pair(coord[, daily_reduction])

Get base and bias data observations based on a single bias gid.

get_linear_correction(bias_data, base_data, ...)

Get the linear correction factors based on 1D bias and base datasets

get_node_cmd(config)

Get a CLI call to call cls.run() on a single node based on an input config.

run([fp_out, max_workers, daily_reduction, ...])

Run linear correction factor calculations for every site in the bias dataset

write_outputs(fp_out, out)

Write outputs to an .h5 file.

Attributes

NT

size of the time dimension, 1 is no time-based bias correction

distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

meta

Get a meta data dictionary on how these bias factors were calculated

NT = 1

size of the time dimension, 1 is no time-based bias correction

static get_linear_correction(bias_data, base_data, bias_feature, base_dset)[source]

Get the linear correction factors based on 1D bias and base datasets

Parameters:
  • bias_data (np.ndarray) – 1D array of biased data observations.

  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_feature (str) – This is the biased feature from bias_fps to retrieve. This should be a single feature name corresponding to base_dset

  • base_dset (str) – A single dataset from the base_fps to retrieve. In the case of wind components, this can be U_100m or V_100m which will retrieve windspeed and winddirection and derive the U/V component.

Returns:

out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder

write_outputs(fp_out, out)[source]

Write outputs to an .h5 file.

Parameters:
  • fp_out (str | None) – Optional .h5 output file to write scalar and adder arrays.

  • out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

run(fp_out=None, max_workers=None, daily_reduction='avg', fill_extend=True, smooth_extend=0, smooth_interior=0)[source]

Run linear correction factor calculations for every site in the bias dataset

Parameters:
  • fp_out (str | None) – Optional .h5 output file to write scalar and adder arrays.

  • max_workers (int) – Number of workers to run in parallel. 1 is serial and None is all available.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

  • fill_extend (bool) – Flag to fill data past distance_upper_bound using spatial nearest neighbor. If False, the extended domain will be left as NaN.

  • smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the distance_upper_bound input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel

  • smooth_interior (float) – Option to smooth the scalar/adder data within the valid spatial domain. This can reduce the affect of extreme values within aggregations over large number of pixels.

Returns:

out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

static compare_dists(base_data, bias_data, adder=0, scalar=1)

Compare two distributions using the two-sample Kolmogorov-Smirnov. When the output is minimized, the two distributions are similar.

Parameters:
  • base_data (np.ndarray) – 1D array of base data observations.

  • bias_data (np.ndarray) – 1D array of biased data observations.

  • adder (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

  • scalar (float) – Factor to adjust the biased data before comparing distributions: bias_data * scalar + adder

Returns:

out (float) – KS test statistic

property distance_upper_bound

Maximum distance (float) to map high-resolution data from exo_source to the low-resolution file_paths input.

fill_and_smooth(out, fill_extend=True, smooth_extend=0, smooth_interior=0)

For a given set of parameters, fill and extend missing positions

Fill data extending beyond the base meta data extent by doing a nearest neighbor gap fill. Smooth interior and extended region with given smoothing values. Interior smoothing can reduce the affect of extreme values within aggregations over large number of pixels. The interior is assumed to be defined by the region without nan values. The extended region is assumed to be the region with nan values.

Parameters:
  • out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

  • fill_extend (bool) – Whether to fill data extending beyond the base meta data with nearest neighbor values.

  • smooth_extend (float) – Option to smooth the scalar/adder data outside of the spatial domain set by the threshold input. This alleviates the weird seams far from the domain of interest. This value is the standard deviation for the gaussian_filter kernel

  • smooth_interior (float) – Value to use to smooth the scalar/adder data inside of the spatial domain set by the threshold input. This can reduce the effect of extreme values within aggregations over large number of pixels. This value is the standard deviation for the gaussian_filter kernel.

Returns:

out (dict) – Dictionary of values defining the mean/std of the bias + base data and the scalar + adder factors to correct the biased data like: bias_data * scalar + adder. Each value is of shape (lat, lon, time).

classmethod get_base_data(base_fps, base_dset, base_gid, base_handler, base_handler_kwargs=None, daily_reduction='avg', decimals=None, base_dh_inst=None)

Get data from the baseline data source, possibly for many high-res base gids corresponding to a single coarse low-res bias gid.

Parameters:
  • base_fps (list | str) – One or more baseline .h5 filepaths representing non-biased data to use to correct the biased dataset. This is typically several years of WTK or NSRDB files.

  • base_dset (str) – A single dataset from the base_fps to retrieve.

  • base_gid (int | np.ndarray) – One or more spatial gids to retrieve from base_fps. The data will be spatially averaged across all of these sites.

  • base_handler (rex.Resource) – A rex data handler similar to rex.Resource or sup3r.DataHandler classes (if using the latter, must also input base_dh_inst)

  • base_handler_kwargs (dict | None) – Optional kwargs to send to the initialization of the base_handler class

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

  • decimals (int | None) – Option to round bias and base data to this number of decimals, this gets passed to np.around(). If decimals is negative, it specifies the number of positions to the left of the decimal point.

  • base_dh_inst (sup3r.DataHandler) – Instantiated DataHandler class that has already loaded the base data (required if base files are .nc and are not being opened by a rex Resource handler).

Returns:

  • out_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • out_ti (pd.DatetimeIndex) – DatetimeIndex object of datetimes corresponding to the output data.

get_base_gid(bias_gid)

Get one or more base gid(s) corresponding to a bias gid.

Parameters:

bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

Returns:

  • dist (np.ndarray) – Array of nearest neighbor distances with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • base_gid (np.ndarray) – Array of base gids that are the nearest neighbors of bias_gid with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

get_bias_data(bias_gid, bias_dh=None)

Get data from the biased data source for a single gid

Parameters:
  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • bias_dh (DataHandler, default=self.bias_dh) – Any DataHandler from sup3r.preprocessing.data_handling. This optional argument allows an alternative handler other than the usual bias_dh. For instance, the derived QuantileDeltaMappingCorrection uses it to access the reference biased dataset as well as the target biased dataset.

Returns:

bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

get_bias_gid(coord)

Get the bias gid from a coordinate.

Parameters:

coord (tuple) – (lat, lon) to get data for.

Returns:

  • bias_gid (int) – gid of the data to retrieve in the bias data source raster data. The gids for this data source are the enumerated indices of the flattened coordinate array.

  • d (float) – Distance in decimal degrees from coord to bias gid

get_data_pair(coord, daily_reduction='avg')

Get base and bias data observations based on a single bias gid.

Parameters:
  • coord (tuple) – (lat, lon) to get data for.

  • daily_reduction (None | str) – Option to do a reduction of the hourly+ source base data to daily data. Can be None (no reduction, keep source time frequency), “avg” (daily average), “max” (daily max), “min” (daily min), “sum” (daily sum/total)

Returns:

  • base_data (np.ndarray) – 1D array of base data spatially averaged across the base_gid input and possibly daily-averaged or min/max’d as well.

  • bias_data (np.ndarray) – 1D array of temporal data at the requested gid.

  • base_dist (np.ndarray) – Array of nearest neighbor distances from coord to the base data sites with length equal to the number of high-resolution baseline gids that map to the low resolution bias gid pixel.

  • bias_dist (Float) – Nearest neighbor distance from coord to the bias data site

classmethod get_node_cmd(config)

Get a CLI call to call cls.run() on a single node based on an input config.

Parameters:

config (dict) – sup3r bias calc config with all necessary args and kwargs to initialize the class and call run() on a single node.

property meta

Get a meta data dictionary on how these bias factors were calculated