sup3r.qa.stats.Sup3rStatsSingle

class Sup3rStatsSingle(source_file_paths=None, s_enhance=1, t_enhance=1, features=None, temporal_slice=slice(None, None, None), target=None, shape=None, raster_file=None, time_chunk_size=None, cache_pattern=None, overwrite_cache=False, overwrite_stats=False, source_handler=None, worker_kwargs=None, get_interp=False, include_stats=None, max_values=None, smoothing=None, coarsen=False, spatial_res=None, temporal_res=None, n_bins=40, max_delta=10, qa_fp=None)[source]

Bases: Sup3rStatsCompute

Base class for doing statistical QA on single file set.

Parameters:
  • source_file_paths (list | str) – A list of source files to compute statistics on. Either .nc or .h5

  • s_enhance (int) – Factor by which the Sup3rGan model enhanced the spatial dimensions of low resolution data

  • t_enhance (int) – Factor by which the Sup3rGan model enhanced temporal dimension of low resolution data

  • features (list) – Features for which to compute wind stats. e.g. [‘pressure_100m’, ‘temperature_100m’, ‘windspeed_100m’, ‘vorticity_100m’]

  • temporal_slice (slice | tuple | list) – Slice defining size of full temporal domain. e.g. If we have 5 files each with 5 time steps then temporal_slice = slice(None) will select all 25 time steps. This can also be a tuple / list with length 3 that will be interpreted as slice(*temporal_slice)

  • target (tuple) – (lat, lon) lower left corner of raster. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.

  • shape (tuple) – (rows, cols) grid size. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.

  • raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.

  • time_chunk_size (int) – Size of chunks to split time dimension into for parallel data extraction. If running in serial this can be set to the size of the full time index for best performance.

  • cache_pattern (str | None) – Pattern for files for saving feature data. e.g. file_path_{feature}.pkl Each feature will be saved to a file with the feature name replaced in cache_pattern. If not None feature arrays will be saved here and not stored in self.data until load_cached_data is called. The cache_pattern can also include {shape}, {target}, {times} which will help ensure unique cache files for complex problems.

  • overwrite_cache (bool) – Whether to overwrite cache files storing the computed/extracted feature data

  • overwrite_stats (bool) – Whether to overwrite saved stats

  • input_handler (str | None) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.

  • worker_kwargs (dict | None) – Dictionary of worker values. Can include max_workers, extract_workers, compute_workers, load_workers, norm_workers, and ti_workers. Each argument needs to be an integer or None.

    The value of max workers will set the value of all other worker args. If max_workers == 1 then all processes will be serialized. If max_workers == None then other worker args will use their own provided values.

    extract_workers is the max number of workers to use for extracting features from source data. If None it will be estimated based on memory limits. If 1 processes will be serialized. compute_workers is the max number of workers to use for computing derived features from raw features in source data. load_workers is the max number of workers to use for loading cached feature data. norm_workers is the max number of workers to use for normalizing feature data. ti_workers is the max number of workers to use to get full time index. Useful when there are many input files each with a single time step. If this is greater than one, time indices for input files will be extracted in parallel and then concatenated to get the full time index. If input files do not all have time indices or if there are few input files this should be set to one.

  • get_interp (bool) – Whether to include interpolated baseline stats in output

  • include_stats (list | None) – List of stats to include in output. e.g. [‘time_derivative’, ‘gradient’, ‘vorticity’, ‘avg_spectrum_k’, ‘avg_spectrum_f’, ‘direct’]. ‘direct’ means direct distribution, as opposed to a distribution of the gradient or time derivative.

  • max_values (dict | None) – Dictionary of max values to keep for stats. e.g. {‘time_derivative’: 10, ‘gradient’: 14, ‘vorticity’: 7}

  • smoothing (float | None) – Value passed to gaussian filter used for smoothing source data

  • spatial_res (float | None) – Spatial resolution for source data in meters. e.g. 2000. This is used to determine the wavenumber range for spectra calculations.

  • temporal_res (float | None) – Temporal resolution for source data in seconds. e.g. 60. This is used to determine the frequency range for spectra calculations and to scale temporal derivatives.

  • coarsen (bool) – Whether to coarsen data or not

  • max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20

  • n_bins (int) – Number of bins to use for constructing probability distributions

  • qa_fp (str) – File path for saving statistics. Only .pkl supported.

Methods

check_return_cache(feature, shape)

Check if interpolated data is cached and return data if it is.

close()

Close any open file handlers

coarsen_data(data[, smoothing])

Re-coarsen a high-resolution synthetic output dataset

export(qa_fp, data)

Export stats dictionary to pkl file.

get_feature_data(feature)

Get data for requested feature

get_feature_stats(feature)

Get stats for high and low resolution fields

get_fluctuation(var)

Get difference between array and temporal average of the same array

get_node_cmd(config)

Get a CLI call to initialize Sup3rStats and execute the Sup3rStats.run() method based on an input config

get_source_data(file_paths[, handler_kwargs])

Get source data using provided source file paths

get_stats(var[, interp, period])

Get stats for wind fields

interpolate_data(feature, low_res)

Get interpolated low res field

load_cache(file_name)

Load data from cache file

run()

Go through all requested features and get the dictionary of statistics.

save_cache(array, file_name)

Save data to cache file

Attributes

compute_features

Get list of requested feature names

f_range

Get range of frequencies to use for frequency spectrum calculation

features

Get a list of requested feature names

input_features

Get a list of requested feature names

k_range

Get range of wavenumbers to use for wavenumber spectrum calculation

lat_lon

Get lat/lon for output data

meta

Get the meta data corresponding to the flattened source low-res data

shape

Shape of source data

source_handler

Get source data handler

source_handler_class

Get source handler class

source_type

Get output data type

time_index

Get the time index associated with the source data

close()[source]

Close any open file handlers

property source_type

Get output data type

Returns:

output_type – e.g. ‘nc’ or ‘h5’

property source_handler_class

Get source handler class

property source_handler

Get source data handler

get_source_data(file_paths, handler_kwargs=None)[source]

Get source data using provided source file paths

Parameters:
  • file_paths (list | str) – A list of source files to extract raster data from. Each file must have the same number of timesteps. Can also pass a string with a unix-style file path which will be passed through glob.glob

  • handler_kwargs (dict) – Dictionary of keyword arguments passed to sup3r.preprocessing.data_handling.DataHandler

Returns:

ndarray – Array of data from source file paths (spatial_1, spatial_2, temporal, features)

property shape

Shape of source data

property lat_lon

Get lat/lon for output data

property meta

Get the meta data corresponding to the flattened source low-res data

Returns:

pd.DataFrame

property time_index

Get the time index associated with the source data

Returns:

pd.DatetimeIndex

property input_features

Get a list of requested feature names

Returns:

list

property compute_features

Get list of requested feature names

coarsen_data(data, smoothing=None)[source]

Re-coarsen a high-resolution synthetic output dataset

Parameters:
  • data (np.ndarray) – A copy of the high-resolution output data as a numpy array of shape (spatial_1, spatial_2, temporal)

  • smoothing (float | None) – Amount of smoothing to apply using a gaussian filter.

Returns:

data (np.ndarray) – A spatiotemporally coarsened copy of the input dataset, still with shape (spatial_1, spatial_2, temporal)

check_return_cache(feature, shape)

Check if interpolated data is cached and return data if it is. Returns cache file name if cache_pattern is not None

Parameters:
  • feature (str) – Name of interpolated feature to check for cache

  • shape (tuple) – Shape of low resolution data. Used to define cache file_name.

Returns:

  • var_itp (ndarray | None) – Array of interpolated data if data exists. Otherwise returns None

  • file_name (str) – Name of cache file for interpolated data. If cache_pattern is None this returns None

export(qa_fp, data)

Export stats dictionary to pkl file.

Parameters:
  • qa_fp (str | None) – Optional filepath to output QA file (only .h5 is supported)

  • data (dict) – A dictionary with stats for low and high resolution wind fields

  • overwrite_stats (bool) – Whether to overwrite saved stats or not

property f_range

Get range of frequencies to use for frequency spectrum calculation

property features

Get a list of requested feature names

Returns:

list

get_feature_data(feature)

Get data for requested feature

Parameters:

feature (str) – Name of feature to get stats for

Returns:

ndarray – Array of data for requested feature

get_feature_stats(feature)

Get stats for high and low resolution fields

Parameters:

feature (str) – Name of feature to get stats for

Returns:

  • source_stats (dict) – Dictionary of stats for input fields

  • interp (dict) – Dictionary of stats for spatiotemporally interpolated fields

static get_fluctuation(var)

Get difference between array and temporal average of the same array

Parameters:

var (ndarray) – Array of data to calculate flucation for (spatial_1, spatial_2, temporal)

Returns:

dvar (ndarray) – Array with fluctuation data (spatial_1, spatial_2, temporal)

classmethod get_node_cmd(config)

Get a CLI call to initialize Sup3rStats and execute the Sup3rStats.run() method based on an input config

Parameters:

config (dict) – sup3r wind stats config with all necessary args and kwargs to initialize Sup3rStats and execute Sup3rStats.run()

get_stats(var, interp=False, period=None)

Get stats for wind fields

Parameters:
  • var (ndarray) – (lat, lon, temporal)

  • interp (bool) – Whether or not this is interpolated data. If True then this means that the spatial_res and temporal_res is different than the input data and needs to be scaled to get accurate derivatives.

  • period (float | None) – If variable is periodic this gives that period. e.g. If the variable is winddirection the period is 360 degrees and we need to account for 0 and 360 being close.

Returns:

stats (dict) – Dictionary of stats for wind fields

interpolate_data(feature, low_res)

Get interpolated low res field

Parameters:
  • feature (str) – Name of feature to interpolate

  • low_res (ndarray) – Array of low resolution data to interpolate (spatial_1, spatial_2, temporal)

Returns:

var_itp (ndarray) – Array of interpolated data (spatial_1, spatial_2, temporal)

property k_range

Get range of wavenumbers to use for wavenumber spectrum calculation

classmethod load_cache(file_name)

Load data from cache file

Parameters:

file_name (str) – Path to cache file

Returns:

array (ndarray) – Wind field data

run()

Go through all requested features and get the dictionary of statistics.

Returns:

stats (dict) – Dictionary of statistics, where keys are source/interp appended with the feature name. Values are dictionaries of statistics, such as gradient, avg_spectrum, time_derivative, etc

classmethod save_cache(array, file_name)

Save data to cache file

Parameters:
  • array (ndarray) – Wind field data

  • file_name (str) – Path to cache file