sup3r.preprocessing.data_handlers.factory.DailyDataHandler#

class DailyDataHandler(file_paths, *, features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, time_roll=0, time_shift=None, hr_spatial_coarsen=1, nan_method_kwargs=None, BaseLoader=None, FeatureRegistry=None, interp_kwargs=None, cache_kwargs=None, **kwargs)[source]#

Bases: DataHandler

General data handler class with daily data as an additional attribute. xr.Dataset coarsen method employed to compute averages / mins / maxes over daily windows. Special treatment of clearsky_ratio, which requires derivation from total clearsky_ghi and total ghi.

TODO: (1) Not a fan of manually adding cs_ghi / ghi and then removing. Maybe this could be handled through a derivation instead

(2) We assume daily and hourly data here but we could generalize this to go from daily -> any time step. This would then enable the CC models to do arbitrary temporal enhancement.

Parameters:

file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass
features (list | str) – Features to load and / or derive. If ‘all’ then all available raw features will be loaded. Specify explicit feature names for derivations.
res_kwargs (dict) – Additional keyword arguments passed through to the BaseLoader. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.
chunks (dict | str) – Dictionary of chunk sizes to pass through to dask.array.from_array() or xr.Dataset().chunk(). Will be converted to a tuple when used in from_array(). These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” or None in addition to a dictionary. None will not do any chunking and load data into memory as np.array
target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list [start, stop, step]
threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.
time_roll (int) – Number of steps to roll along the time axis. Passed to xr.Dataset.roll()
time_shift (int | None) – Number of minutes to shift time axis. This can be used, for example, to shift the time index for daily data so that the time stamp for a given day starts at the zeroth minute instead of at noon, as is the case for most GCM data.
hr_spatial_coarsen (int) – Spatial coarsening factor. Passed to xr.Dataset.coarsen()
nan_method_kwargs (str | dict | None) – Keyword arguments for nan handling. If ‘mask’, time steps with nans will be dropped. Otherwise this should be a dict of kwargs which will be passed to sup3r.preprocessing.accessor.Sup3rX.interpolate_na(). e.g. {‘method’: ‘linear’, ‘dim’: ‘time’}
BaseLoader (Callable) – Base level file loader wrapped by Loader. This is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.
FeatureRegistry (dict) – Dictionary of DerivedFeature objects used for derivations
interp_kwargs (dict | None) – Dictionary of kwargs for level interpolation. Can include “method” and “run_level_check” keys. Method specifies how to perform height interpolation. e.g. Deriving u_20m from u_10m and u_100m. Options are “linear” and “log”. See sup3r.preprocessing.derivers.Deriver.do_level_interpolation()
cache_kwargs (dict | None) – Dictionary with kwargs for caching wrangled data. This should at minimum include a cache_pattern key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type. See class:Cacher for description of more arguments.
kwargs (dict) – Dictionary of additional keyword args for Rasterizer, used specifically for rasterizing flattened data

Methods

`check_registry`(feature)	Get compute method from the registry if available.
`derive`(feature)	Routine to derive requested features.
`do_level_interpolation`(feature[, interp_kwargs])	Interpolate over height or pressure to derive the given feature.
`get_inputs`(feature)	Get inputs for the given feature and inputs for those inputs.
`get_multi_level_data`(feature)	Get data stored in multi-level arrays, like u stored on pressure levels.
`get_single_level_data`(feature)	When doing level interpolation we should include the single level data available.
`has_interp_variables`(feature)	Check if the given feature can be interpolated from values at nearby heights or from pressure level data.
`map_new_name`(feature, pattern)	If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.
`no_overlap`(feature)	Check if any of the nested inputs for 'feature' contain 'feature'
`post_init_log`([args_dict])	Log additional arguments after initialization.
`wrap`(data)	Return a `Sup3rDataset` object or tuple of such.

Attributes

`FEATURE_REGISTRY`
`data`	Return underlying data.
`shape`	Get shape of underlying data.

check_registry(feature) → ndarray | Array | str | None#: Get compute method from the registry if available. Will check for pattern feature match in feature registry. e.g. if u_100m matches a feature registry entry of u_(.*)m

property data#

Return underlying data.

Returns:: Sup3rDataset

sup3r.preprocessing.data_handlers.factory.DailyDataHandler

Contents

sup3r.preprocessing.data_handlers.factory.DailyDataHandler#