sup3r.qa.qa.Sup3rQa

class Sup3rQa(source_file_paths, out_file_path, s_enhance, t_enhance, temporal_coarsening_method, features=None, source_features=None, output_names=None, temporal_slice=slice(None, None, None), target=None, shape=None, raster_file=None, qa_fp=None, bias_correct_method=None, bias_correct_kwargs=None, save_sources=True, time_chunk_size=None, cache_pattern=None, overwrite_cache=False, input_handler=None, worker_kwargs=None)[source]

Bases: object

Class for doing QA on sup3r forward pass outputs.

Note that this only works if the sup3r forward pass output can be reshaped into a 2D raster dataset (e.g. no sparsifying of the meta data).

Parameters:
  • source_file_paths (list | str) – A list of low-resolution source files to extract raster data from. Each file must have the same number of timesteps. Can also pass a string with a unix-style file path which will be passed through glob.glob

  • out_file_path (str) – A single sup3r-resolved output file (either .nc or .h5) with high-resolution data corresponding to the source_file_paths * s_enhance * t_enhance

  • s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data

  • t_enhance (int) – Factor by which the Sup3rGan model will enhance temporal dimension of low resolution data

  • temporal_coarsening_method (str | list) – [subsample, average, total, min, max] Subsample will take every t_enhance-th time step, average will average over t_enhance time steps, total will sum over t_enhance time steps. This can also be a list of method names corresponding to the list of features.

  • features (str | list | None) – Explicit list of features to validate. Can be a single feature str, list of string feature names, or None for all features found in the out_file_path.

  • source_features (str | list | None) – Optional feature names to retrieve from the source dataset if the source feature names are not the same as the sup3r output feature names. This must be of the same type / length as the features input. For example: (features=”ghi”, source_features=”rsds”) or (features=[“windspeed_100m”, “windspeed_200m”],

    source_features=[[“U_100m”, “V_100m”], [“U_200m”, “V_200m”]])

  • output_names (str | list) – Optional output file dataset names corresponding to the features list input

  • temporal_slice (slice | tuple | list) – Slice defining size of full temporal domain. e.g. If we have 5 files each with 5 time steps then temporal_slice = slice(None) will select all 25 time steps. This can also be a tuple / list with length 3 that will be interpreted as slice(*temporal_slice)

  • target (tuple) – (lat, lon) lower left corner of raster. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.

  • shape (tuple) – (rows, cols) grid size. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.

  • qa_fp (str | None) – Optional filepath to output QA file when you call Sup3rQa.run() (only .h5 is supported)

  • bias_correct_method (str | None) – Optional bias correction function name that can be imported from the sup3r.bias.bias_transforms module. This will transform the source data according to some predefined bias correction transformation along with the bias_correct_kwargs. As the first argument, this method must receive a generic numpy array of data to be bias corrected

  • bias_correct_kwargs (dict | None) – Optional namespace of kwargs to provide to bias_correct_method. If this is provided, it must be a dictionary where each key is a feature name and each value is a dictionary of kwargs to correct that feature. You can bias correct only certain input features by only including those feature names in this dict.

  • save_sources (bool) – Flag to save re-coarsened synthetic data and true low-res data to qa_fp in addition to the error dataset

  • time_chunk_size (int) – Size of chunks to split time dimension into for parallel data extraction. If running in serial this can be set to the size of the full time index for best performance.

  • cache_pattern (str | None) – Pattern for files for saving feature data. e.g. file_path_{feature}.pkl Each feature will be saved to a file with the feature name replaced in cache_pattern. If not None feature arrays will be saved here and not stored in self.data until load_cached_data is called. The cache_pattern can also include {shape}, {target}, {times} which will help ensure unique cache files for complex problems.

  • overwrite_cache (bool) – Whether to overwrite cache files storing the computed/extracted feature data

  • input_handler (str | None) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.

  • worker_kwargs (dict | None) – Dictionary of worker values. Can include max_workers, extract_workers, compute_workers, load_workers, and ti_workers. Each argument needs to be an integer or None.

    The value of max workers will set the value of all other worker args. If max_workers == 1 then all processes will be serialized. If max_workers == None then other worker args will use their own provided values.

    extract_workers is the max number of workers to use for extracting features from source data. If None it will be estimated based on memory limits. If 1 processes will be serialized. compute_workers is the max number of workers to use for computing derived features from raw features in source data. load_workers is the max number of workers to use for loading cached feature data. ti_workers is the max number of workers to use to get full time index. Useful when there are many input files each with a single time step. If this is greater than one, time indices for input files will be extracted in parallel and then concatenated to get the full time index. If input files do not all have time indices or if there are few input files this should be set to one.

Methods

bias_correct_source_data(data, lat_lon, ...)

Bias correct data using a method defined by the bias_correct_method input to ForwardPassStrategy

close()

Close any open file handlers

coarsen_data(idf, feature, data)

Re-coarsen a high-resolution synthetic output dataset

export(qa_fp, data, dset_name[, dset_suffix])

Export error dictionary to h5 file.

get_dset_out(name)

Get an output dataset from the forward pass output file.

get_node_cmd(config)

Get a CLI call to initialize Sup3rQa and execute the Sup3rQa.run() method based on an input config

get_source_dset(feature, source_feature)

Get source low res input data including optional bias correction

run()

Go through all datasets and get the error for the re-coarsened synthetic minus the true low-res source data.

Attributes

features

Get a list of feature names from the output file, excluding meta and time index datasets

lr_shape

Get the shape of the source low-res data raster (rows, cols, time, features)

meta

Get the meta data corresponding to the flattened source low-res data

output_handler_class

Get the output handler class.

output_names

Get a list of output dataset names corresponding to the features list

output_type

Get output data type

source_features

Get a list of feature names from the source input file, excluding meta and time index datasets.

source_features_flat

Get a flat list of source feature names, so for example if (features=["windspeed_100m", "windspeed_200m"], source_features=[["U_100m", "V_100m"], ["U_200m", "V_200m"]]) then this property will return ["U_100m", "V_100m", "U_200m", "V_200m"]

time_index

Get the time index associated with the source low-res data

close()[source]

Close any open file handlers

property meta

Get the meta data corresponding to the flattened source low-res data

Returns:

pd.DataFrame

property lr_shape

Get the shape of the source low-res data raster (rows, cols, time, features)

property time_index

Get the time index associated with the source low-res data

Returns:

pd.DatetimeIndex

property features

Get a list of feature names from the output file, excluding meta and time index datasets

Returns:

list

property source_features

Get a list of feature names from the source input file, excluding meta and time index datasets. This property considers the features input mapping if a dictionary was provided, e.g. if (features=’ghi’ source_features=’rsds’), this property will return [‘rsds’]

property source_features_flat

Get a flat list of source feature names, so for example if (features=[“windspeed_100m”, “windspeed_200m”],

source_features=[[“U_100m”, “V_100m”], [“U_200m”, “V_200m”]])

then this property will return [“U_100m”, “V_100m”, “U_200m”, “V_200m”]

property output_names

Get a list of output dataset names corresponding to the features list

property output_type

Get output data type

Returns:

output_type – e.g. ‘nc’ or ‘h5’

property output_handler_class

Get the output handler class.

Returns:

HandlerClass (rex.Resource | xr.open_dataset)

bias_correct_source_data(data, lat_lon, source_feature)[source]

Bias correct data using a method defined by the bias_correct_method input to ForwardPassStrategy

Parameters:
  • data (np.ndarray) – Any source data to be bias corrected, with the feature channel in the last axis.

  • lat_lon (np.ndarray) – Latitude longitude array for the given data. Used to get the correct bc factors for the appropriate domain. (n_lats, n_lons, 2)

  • source_feature (str | list) – The source feature name corresponding to the output feature name

Returns:

data (np.ndarray) – Data corrected by the bias_correct_method ready for input to the forward pass through the generative model.

get_source_dset(feature, source_feature)[source]

Get source low res input data including optional bias correction

Parameters:
  • feature (str) – Feature name

  • source_feature (str | list) – The source feature name corresponding to the output feature name

Returns:

data_true (np.array) – Low-res source input data including optional bias correction

get_dset_out(name)[source]

Get an output dataset from the forward pass output file.

Parameters:

name (str) – Name of the output dataset to retrieve. Must be found in the features property and the forward pass output file.

Returns:

out (np.ndarray) – A copy of the high-resolution output data as a numpy array of shape (spatial_1, spatial_2, temporal)

coarsen_data(idf, feature, data)[source]

Re-coarsen a high-resolution synthetic output dataset

Parameters:
  • idf (int) – Feature index

  • feature (str) – Feature name

  • data (np.ndarray) – A copy of the high-resolution output data as a numpy array of shape (spatial_1, spatial_2, temporal)

Returns:

data (np.ndarray) – A spatiotemporally coarsened copy of the input dataset, still with shape (spatial_1, spatial_2, temporal)

classmethod get_node_cmd(config)[source]

Get a CLI call to initialize Sup3rQa and execute the Sup3rQa.run() method based on an input config

Parameters:

config (dict) – sup3r QA config with all necessary args and kwargs to initialize Sup3rQa and execute Sup3rQa.run()

export(qa_fp, data, dset_name, dset_suffix='')[source]

Export error dictionary to h5 file.

Parameters:
  • qa_fp (str | None) – Optional filepath to output QA file (only .h5 is supported)

  • data (np.ndarray) – An array with shape (space1, space2, time) that represents the re-coarsened synthetic data minus the source true low-res data, or another dataset of the same shape to be written to disk

  • dset_name (str) – Base dataset name to save data to

  • dset_suffix (str) – Optional suffix to append to dset_name with an underscore before saving.

run()[source]

Go through all datasets and get the error for the re-coarsened synthetic minus the true low-res source data.

Returns:

errors (dict) – Dictionary of errors, where keys are the feature names, and each value is an array with shape (space1, space2, time) that represents the re-coarsened synthetic data minus the source true low-res data