sup3r.preprocessing.data_handlers.base.DailyDataHandler#

class DailyDataHandler(file_paths, *, features='all', load_features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, time_roll=0, time_shift=None, hr_spatial_coarsen=1, nan_method_kwargs=None, BaseLoader=None, FeatureRegistry=None, interp_kwargs=None, cache_kwargs=None, **kwargs)[source]#

Bases: DataHandler

General data handler class with daily data as an additional attribute. xr.Dataset coarsen method employed to compute averages / mins / maxes over daily windows. Special treatment of clearsky_ratio, which requires derivation from total clearsky_ghi and total ghi.

TODO: (1) Not a fan of manually adding cs_ghi / ghi and then removing. Maybe this could be handled through a derivation instead

(2) We assume daily and hourly data here but we could generalize this to go from daily -> any time step. This would then enable the CC models to do arbitrary temporal enhancement.

Parameters:
  • file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass

  • features (list | str) – Features to derive. If ‘all’ then all available raw features will just be loaded. Specify explicit feature names for derivations.

  • load_features (list | str) – Features to load and make available for derivations. If ‘all’ then all available raw features will be loaded and made available for derivations. This can be used to restrict features used for derivations. For example, to derive ‘temperature_100m’ from only temperature isobars, from data that includes single level values as well (like temperature_2m), don’t include ‘temperature_2m’ in the load_features list.

  • res_kwargs (dict) – Additional keyword arguments passed through to the BaseLoader. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.

  • chunks (dict | str) – Dictionary of chunk sizes to pass through to dask.array.from_array() or xr.Dataset().chunk(). Will be converted to a tuple when used in from_array(). These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” or None in addition to a dictionary. None will not do any chunking and load data into memory as np.array

  • target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.

  • shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.

  • time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list [start, stop, step]

  • threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.

  • time_roll (int) – Number of steps to roll along the time axis. Passed to xr.Dataset.roll()

  • time_shift (int | None) – Number of minutes to shift time axis. This can be used, for example, to shift the time index for daily data so that the time stamp for a given day starts at the zeroth minute instead of at noon, as is the case for most GCM data.

  • hr_spatial_coarsen (int) – Spatial coarsening factor. Passed to xr.Dataset.coarsen()

  • nan_method_kwargs (str | dict | None) – Keyword arguments for nan handling. If ‘mask’, time steps with nans will be dropped. Otherwise this should be a dict of kwargs which will be passed to sup3r.preprocessing.accessor.Sup3rX.interpolate_na(). e.g. {‘method’: ‘linear’, ‘dim’: ‘time’}

  • BaseLoader (Callable) – Base level file loader wrapped by Loader. This is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.

  • FeatureRegistry (dict) – Dictionary of DerivedFeature objects used for derivations

  • interp_kwargs (dict | None) – Dictionary of kwargs for level interpolation. Can include “method” and “run_level_check” keys. Method specifies how to perform height interpolation. e.g. Deriving u_20m from u_10m and u_100m. Options are “linear” and “log”. See sup3r.preprocessing.derivers.Deriver.do_level_interpolation()

  • cache_kwargs (dict | None) – Dictionary with kwargs for caching wrangled data. This should at minimum include a cache_pattern key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type. See class:Cacher for description of more arguments.

  • kwargs (dict) – Dictionary of additional keyword args for Rasterizer, used specifically for rasterizing flattened data

Methods

check_registry(feature)

Get compute method from the registry if available.

derive(feature)

Routine to derive requested features.

do_level_interpolation(feature[, interp_kwargs])

Interpolate over height or pressure to derive the given feature.

get_inputs(feature)

Get inputs for the given feature and inputs for those inputs.

get_multi_level_data(feature)

Get data stored in multi-level arrays, like u stored on pressure levels.

get_single_level_data(feature)

When doing level interpolation we should include the single level data available.

has_interp_variables(feature)

Check if the given feature can be interpolated from values at nearby heights or from pressure level data.

map_new_name(feature, pattern)

If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.

no_overlap(feature)

Check if any of the nested inputs for 'feature' contain 'feature'

post_init_log([args_dict])

Log additional arguments after initialization.

wrap(data)

Return a Sup3rDataset object or tuple of such.

Attributes

FEATURE_REGISTRY

data

Return underlying data.

shape

Get shape of underlying data.

check_registry(feature) ndarray | Array | str | None#

Get compute method from the registry if available. Will check for pattern feature match in feature registry. e.g. if u_100m matches a feature registry entry of u_(.*)m

property data#

Return underlying data.

Returns:

Sup3rDataset

See also

wrap()

derive(feature) ndarray | Array#

Routine to derive requested features. Employs a little recursion to locate differently named features with a name map in the feature registry. i.e. if FEATURE_REGISTRY contains a key, value pair like “windspeed”: “wind_speed” then requesting “windspeed” will ultimately return a compute method (or fetch from raw data) for “wind_speed

Note

Features are all saved as lower case names and __contains__ checks will use feature.lower()

do_level_interpolation(feature, interp_kwargs=None) DataArray#

Interpolate over height or pressure to derive the given feature.

get_inputs(feature)#

Get inputs for the given feature and inputs for those inputs.

get_multi_level_data(feature)#

Get data stored in multi-level arrays, like u stored on pressure levels.

get_single_level_data(feature)#

When doing level interpolation we should include the single level data available. e.g. If we have u_100m already and want to interpolate u_40m from multi-level data U we should add u_100m at height 100m before doing interpolation, since 100 could be a closer level to 40m than those available in U.

has_interp_variables(feature)#

Check if the given feature can be interpolated from values at nearby heights or from pressure level data. e.g. If u_10m and u_50m exist then u_30m can be interpolated from these. If a pressure level array u is available this can also be used, in conjunction with height data.

map_new_name(feature, pattern)#

If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.

no_overlap(feature)#

Check if any of the nested inputs for ‘feature’ contain ‘feature’

post_init_log(args_dict=None)#

Log additional arguments after initialization.

property shape#

Get shape of underlying data.

wrap(data)#

Return a Sup3rDataset object or tuple of such. This is a tuple when the .data attribute belongs to a Collection object like BatchHandler. Otherwise this is Sup3rDataset object, which is either a wrapped 2-tuple or 1-tuple (e.g. len(data) == 2 or len(data) == 1). This is a 2-tuple when .data belongs to a dual container object like DualSampler and a 1-tuple otherwise.