sup3r.qa.stats.Sup3rStatsSingle

class Sup3rStatsSingle(source_file_paths=None, s_enhance=1, t_enhance=1, features=None, temporal_slice=slice(None, None, None), target=None, shape=None, raster_file=None, time_chunk_size=None, cache_pattern=None, overwrite_cache=False, overwrite_stats=False, source_handler=None, worker_kwargs=None, get_interp=False, include_stats=None, max_values=None, smoothing=None, coarsen=False, spatial_res=None, temporal_res=None, n_bins=40, max_delta=10, qa_fp=None)[source]

Bases: Sup3rStatsCompute

Base class for doing statistical QA on single file set.

Parameters:

source_file_paths (list | str) – A list of source files to compute statistics on. Either .nc or .h5
s_enhance (int) – Factor by which the Sup3rGan model enhanced the spatial dimensions of low resolution data
t_enhance (int) – Factor by which the Sup3rGan model enhanced temporal dimension of low resolution data
features (list) – Features for which to compute wind stats. e.g. [‘pressure_100m’, ‘temperature_100m’, ‘windspeed_100m’, ‘vorticity_100m’]
temporal_slice (slice | tuple | list) – Slice defining size of full temporal domain. e.g. If we have 5 files each with 5 time steps then temporal_slice = slice(None) will select all 25 time steps. This can also be a tuple / list with length 3 that will be interpreted as slice(*temporal_slice)
target (tuple) – (lat, lon) lower left corner of raster. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.
shape (tuple) – (rows, cols) grid size. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.
raster_file (str | None) – File for raster_index array for the corresponding target and shape. If specified the raster_index will be loaded from the file if it exists or written to the file if it does not yet exist. If None raster_index will be calculated directly. You should provide target+shape or raster_file, or if all three are None the full source domain will be used.
time_chunk_size (int) – Size of chunks to split time dimension into for parallel data extraction. If running in serial this can be set to the size of the full time index for best performance.
cache_pattern (str | None) – Pattern for files for saving feature data. e.g. file_path_{feature}.pkl Each feature will be saved to a file with the feature name replaced in cache_pattern. If not None feature arrays will be saved here and not stored in self.data until load_cached_data is called. The cache_pattern can also include {shape}, {target}, {times} which will help ensure unique cache files for complex problems.
overwrite_cache (bool) – Whether to overwrite cache files storing the computed/extracted feature data
overwrite_stats (bool) – Whether to overwrite saved stats
input_handler (str | None) – data handler class to use for input data. Provide a string name to match a class in data_handling.py. If None the correct handler will be guessed based on file type and time series properties.
worker_kwargs (dict | None) – Dictionary of worker values. Can include max_workers, extract_workers, compute_workers, load_workers, norm_workers, and ti_workers. Each argument needs to be an integer or None.

The value of max workers will set the value of all other worker args. If max_workers == 1 then all processes will be serialized. If max_workers == None then other worker args will use their own provided values.

extract_workers is the max number of workers to use for extracting features from source data. If None it will be estimated based on memory limits. If 1 processes will be serialized. compute_workers is the max number of workers to use for computing derived features from raw features in source data. load_workers is the max number of workers to use for loading cached feature data. norm_workers is the max number of workers to use for normalizing feature data. ti_workers is the max number of workers to use to get full time index. Useful when there are many input files each with a single time step. If this is greater than one, time indices for input files will be extracted in parallel and then concatenated to get the full time index. If input files do not all have time indices or if there are few input files this should be set to one.
get_interp (bool) – Whether to include interpolated baseline stats in output
include_stats (list | None) – List of stats to include in output. e.g. [‘time_derivative’, ‘gradient’, ‘vorticity’, ‘avg_spectrum_k’, ‘avg_spectrum_f’, ‘direct’]. ‘direct’ means direct distribution, as opposed to a distribution of the gradient or time derivative.
max_values (dict | None) – Dictionary of max values to keep for stats. e.g. {‘time_derivative’: 10, ‘gradient’: 14, ‘vorticity’: 7}
smoothing (float | None) – Value passed to gaussian filter used for smoothing source data
spatial_res (float | None) – Spatial resolution for source data in meters. e.g. 2000. This is used to determine the wavenumber range for spectra calculations.
temporal_res (float | None) – Temporal resolution for source data in seconds. e.g. 60. This is used to determine the frequency range for spectra calculations and to scale temporal derivatives.
coarsen (bool) – Whether to coarsen data or not
max_delta (int, optional) – Optional maximum limit on the raster shape that is retrieved at once. If shape is (20, 20) and max_delta=10, the full raster will be retrieved in four chunks of (10, 10). This helps adapt to non-regular grids that curve over large distances, by default 20
n_bins (int) – Number of bins to use for constructing probability distributions
qa_fp (str) – File path for saving statistics. Only .pkl supported.

Methods

`check_return_cache`(feature, shape)	Check if interpolated data is cached and return data if it is.
`close`()	Close any open file handlers
`coarsen_data`(data[, smoothing])	Re-coarsen a high-resolution synthetic output dataset
`export`(qa_fp, data)	Export stats dictionary to pkl file.
`get_feature_data`(feature)	Get data for requested feature
`get_feature_stats`(feature)	Get stats for high and low resolution fields
`get_fluctuation`(var)	Get difference between array and temporal average of the same array
`get_node_cmd`(config)	Get a CLI call to initialize Sup3rStats and execute the Sup3rStats.run() method based on an input config
`get_source_data`(file_paths[, handler_kwargs])	Get source data using provided source file paths
`get_stats`(var[, interp, period])	Get stats for wind fields
`interpolate_data`(feature, low_res)	Get interpolated low res field
`load_cache`(file_name)	Load data from cache file
`run`()	Go through all requested features and get the dictionary of statistics.
`save_cache`(array, file_name)	Save data to cache file

Attributes

`compute_features`	Get list of requested feature names
`f_range`	Get range of frequencies to use for frequency spectrum calculation
`features`	Get a list of requested feature names
`input_features`	Get a list of requested feature names
`k_range`	Get range of wavenumbers to use for wavenumber spectrum calculation
`lat_lon`	Get lat/lon for output data
`meta`	Get the meta data corresponding to the flattened source low-res data
`shape`	Shape of source data
`source_handler`	Get source data handler
`source_handler_class`	Get source handler class
`source_type`	Get output data type
`time_index`	Get the time index associated with the source data

close()[source]: Close any open file handlers

property source_type

Get output data type

Returns:: output_type – e.g. ‘nc’ or ‘h5’

property source_handler_class: Get source handler class

property source_handler: Get source data handler

get_source_data(file_paths, handler_kwargs=None)[source]

Get source data using provided source file paths

Parameters:

file_paths (list | str) – A list of source files to extract raster data from. Each file must have the same number of timesteps. Can also pass a string with a unix-style file path which will be passed through glob.glob
handler_kwargs (dict) – Dictionary of keyword arguments passed to sup3r.preprocessing.data_handling.DataHandler

Returns:

ndarray – Array of data from source file paths (spatial_1, spatial_2, temporal, features)

property shape: Shape of source data

property lat_lon: Get lat/lon for output data

property meta

Get the meta data corresponding to the flattened source low-res data

Returns:: pd.DataFrame

property time_index

Get the time index associated with the source data

Returns:: pd.DatetimeIndex

property input_features

Get a list of requested feature names

Returns:: list

property compute_features: Get list of requested feature names

coarsen_data(data, smoothing=None)[source]

Re-coarsen a high-resolution synthetic output dataset

Parameters:

data (np.ndarray) – A copy of the high-resolution output data as a numpy array of shape (spatial_1, spatial_2, temporal)
smoothing (float | None) – Amount of smoothing to apply using a gaussian filter.

Returns:

data (np.ndarray) – A spatiotemporally coarsened copy of the input dataset, still with shape (spatial_1, spatial_2, temporal)

check_return_cache(feature, shape)

Check if interpolated data is cached and return data if it is. Returns cache file name if cache_pattern is not None

Parameters:

feature (str) – Name of interpolated feature to check for cache
shape (tuple) – Shape of low resolution data. Used to define cache file_name.

Returns:

var_itp (ndarray | None) – Array of interpolated data if data exists. Otherwise returns None
file_name (str) – Name of cache file for interpolated data. If cache_pattern is None this returns None

export(qa_fp, data)

Export stats dictionary to pkl file.

Parameters:

qa_fp (str | None) – Optional filepath to output QA file (only .h5 is supported)
data (dict) – A dictionary with stats for low and high resolution wind fields
overwrite_stats (bool) – Whether to overwrite saved stats or not

property f_range: Get range of frequencies to use for frequency spectrum calculation

property features

Get a list of requested feature names

Returns:: list

get_feature_data(feature)

Get data for requested feature

Parameters:: feature (str) – Name of feature to get stats for
Returns:: ndarray – Array of data for requested feature

get_feature_stats(feature)

Get stats for high and low resolution fields

Parameters:

feature (str) – Name of feature to get stats for

Returns:

source_stats (dict) – Dictionary of stats for input fields
interp (dict) – Dictionary of stats for spatiotemporally interpolated fields

static get_fluctuation(var)

Get difference between array and temporal average of the same array

Parameters:: var (ndarray) – Array of data to calculate flucation for (spatial_1, spatial_2, temporal)
Returns:: dvar (ndarray) – Array with fluctuation data (spatial_1, spatial_2, temporal)

classmethod get_node_cmd(config)

Get a CLI call to initialize Sup3rStats and execute the Sup3rStats.run() method based on an input config

Parameters:: config (dict) – sup3r wind stats config with all necessary args and kwargs to initialize Sup3rStats and execute Sup3rStats.run()

get_stats(var, interp=False, period=None)

Get stats for wind fields

Parameters:

var (ndarray) – (lat, lon, temporal)
interp (bool) – Whether or not this is interpolated data. If True then this means that the spatial_res and temporal_res is different than the input data and needs to be scaled to get accurate derivatives.
period (float | None) – If variable is periodic this gives that period. e.g. If the variable is winddirection the period is 360 degrees and we need to account for 0 and 360 being close.

Returns:

stats (dict) – Dictionary of stats for wind fields

interpolate_data(feature, low_res)

Get interpolated low res field

Parameters:

feature (str) – Name of feature to interpolate
low_res (ndarray) – Array of low resolution data to interpolate (spatial_1, spatial_2, temporal)

Returns:

var_itp (ndarray) – Array of interpolated data (spatial_1, spatial_2, temporal)

property k_range: Get range of wavenumbers to use for wavenumber spectrum calculation

classmethod load_cache(file_name)

Load data from cache file

Parameters:: file_name (str) – Path to cache file
Returns:: array (ndarray) – Wind field data

run()

Go through all requested features and get the dictionary of statistics.

Returns:: stats (dict) – Dictionary of statistics, where keys are source/interp appended with the feature name. Values are dictionaries of statistics, such as gradient, avg_spectrum, time_derivative, etc

classmethod save_cache(array, file_name)

Save data to cache file

Parameters:

array (ndarray) – Wind field data
file_name (str) – Path to cache file