sup3r.qa.stats.Sup3rStatsMulti

class Sup3rStatsMulti(lr_file_paths=None, synth_file_paths=None, hr_file_paths=None, s_enhance=1, t_enhance=1, features=None, lr_t_slice=slice(None, None, None), synth_t_slice=slice(None, None, None), hr_t_slice=slice(None, None, None), target=None, shape=None, raster_file=None, qa_fp=None, time_chunk_size=None, cache_pattern=None, overwrite_cache=False, overwrite_synth_cache=False, overwrite_stats=False, source_handler=None, output_handler=None, worker_kwargs=None, get_interp=False, include_stats=None, max_values=None, smoothing=None, spatial_res=None, temporal_res=None, n_bins=40, max_delta=10, save_fig_data=False)[source]

Bases: Sup3rStatsBase

Class for doing statistical QA on multiple datasets. These datasets are low resolution input to sup3r, the synthetic output, and the true high resolution corresponding to the low resolution input. This class will provide statistics used to compare all these datasets.

Parameters:

lr_file_paths (list | str) – A list of low-resolution source files (either .nc or .h5) to extract raster data from.
synth_file_paths (list | str) – Sup3r-resolved output files (either .nc or .h5) with high-resolution data corresponding to the lr_file_paths * s_enhance * t_enhance
hr_file_paths (list | str) – A list of high-resolution source files (either .nc or .h5) corresponding to the low-resolution source files in lr_file_paths
s_enhance (int) – Factor by which the Sup3rGan model will enhance the spatial dimensions of low resolution data
t_enhance (int) – Factor by which the Sup3rGan model will enhance temporal dimension of low resolution data
features (list) – Features for which to compute wind stats. e.g. [‘pressure_100m’, ‘temperature_100m’, ‘windspeed_100m’, ‘vorticity_100m’]
lr_t_slice (slice | tuple | list) – Slice defining size of temporal domain for the low resolution data.
synth_t_slice (slice | tuple | list) – Slice defining size of temporal domain for the sythetic high resolution data.
hr_t_slice (slice | tuple | list) – Slice defining size of temporal domain for the true high resolution data.
target (tuple) – (lat, lon) lower left corner of raster. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.
shape (tuple) – Shape of the low resolution grid size. (rows, cols). You should provide target+shape or raster_file, or if all three are None the full source domain will be used.
raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.
qa_fp (str | None) – Optional filepath to output QA file when you call Sup3rStatsWind.run() (only .pkl is supported)
time_chunk_size (int) – Size of chunks to split time dimension into for parallel data extraction. If running in serial this can be set to the size of the full time index for best performance.
cache_pattern (str | None) – Pattern for files for saving feature data. e.g. file_path_{feature}.pkl Each feature will be saved to a file with the feature name replaced in cache_pattern. If not None feature arrays will be saved here and not stored in self.data until load_cached_data is called. The cache_pattern can also include {shape}, {target}, {times} which will help ensure unique cache files for complex problems.
overwrite_cache (bool) – Whether to overwrite cache files storing the computed/extracted feature data for low-resolution and high-resolution data
overwrite_synth_cache (bool) – Whether to overwrite cache files stored computed/extracted data for synthetic output.
overwrite_stats (bool) – Whether to overwrite saved stats
input_handler (str | None) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.
output_handler (str | None) – data handler class to use for output data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.
worker_kwargs (dict | None) – Dictionary of worker values. Can include max_workers, extract_workers, compute_workers, load_workers, norm_workers, and ti_workers. Each argument needs to be an integer or None.

The value of max workers will set the value of all other worker args. If max_workers == 1 then all processes will be serialized. If max_workers == None then other worker args will use their own provided values.

extract_workers is the max number of workers to use for extracting features from source data. If None it will be estimated based on memory limits. If 1 processes will be serialized. compute_workers is the max number of workers to use for computing derived features from raw features in source data. load_workers is the max number of workers to use for loading cached feature data. norm_workers is the max number of workers to use for normalizing feature data. ti_workers is the max number of workers to use to get full time index. Useful when there are many input files each with a single time step. If this is greater than one, time indices for input files will be extracted in parallel and then concatenated to get the full time index. If input files do not all have time indices or if there are few input files this should be set to one.
get_interp (bool) – Whether to include interpolated baseline stats in output
include_stats (list | None) – List of stats to include in output. e.g. [‘time_derivative’, ‘gradient’, ‘vorticity’, ‘avg_spectrum_k’, ‘avg_spectrum_f’, ‘direct’]. ‘direct’ means direct distribution, as opposed to a distribution of the gradient or time derivative.
max_values (dict | None) – Dictionary of max values to keep for stats. e.g. {‘time_derivative’: 10, ‘gradient’: 14, ‘vorticity’: 7}
smoothing (float | None) – Value passed to gaussian filter used for smoothing source data
spatial_res (float | None) – Spatial resolution for source data in meters. e.g. 2000. This is used to determine the wavenumber range for spectra calculations.
temporal_res (float | None) – Temporal resolution for source data in seconds. e.g. 60. This is used to determine the frequency range for spectra calculations and to scale temporal derivatives.
max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20
n_bins (int) – Number of bins to use for constructing probability distributions

Methods

`close`()	Close any open file handlers
`export`(qa_fp, data)	Export stats dictionary to pkl file.
`export_fig_data`()	Save data fields for data viz comparison
`get_node_cmd`(config)	Get a CLI call to initialize Sup3rStats and execute the Sup3rStats.run() method based on an input config
`load_cache`(file_name)	Load data from cache file
`run`()	Go through all datasets and get the dictionary of statistics.
`save_cache`(array, file_name)	Save data to cache file

export_fig_data()[source]: Save data fields for data viz comparison

close()[source]: Close any open file handlers

export(qa_fp, data)

Export stats dictionary to pkl file.

Parameters:

qa_fp (str | None) – Optional filepath to output QA file (only .h5 is supported)
data (dict) – A dictionary with stats for low and high resolution wind fields
overwrite_stats (bool) – Whether to overwrite saved stats or not

classmethod get_node_cmd(config)

Get a CLI call to initialize Sup3rStats and execute the Sup3rStats.run() method based on an input config

Parameters:: config (dict) – sup3r wind stats config with all necessary args and kwargs to initialize Sup3rStats and execute Sup3rStats.run()

classmethod load_cache(file_name)

Load data from cache file

Parameters:: file_name (str) – Path to cache file
Returns:: array (ndarray) – Wind field data

classmethod save_cache(array, file_name)

Save data to cache file

Parameters:

array (ndarray) – Wind field data
file_name (str) – Path to cache file

run()[source]

Go through all datasets and get the dictionary of statistics.

Returns:: stats (dict) – Dictionary of statistics, where keys are lr/hr/interp appended with the feature name. Values are dictionaries of statistics, such as gradient, avg_spectrum, time_derivative, etc