sup3r.preprocessing.data_handlers.nc_cc.DataHandlerNCforCC

sup3r.preprocessing.data_handlers.nc_cc.DataHandlerNCforCC#

class DataHandlerNCforCC(file_paths, *, features='all', nsrdb_source_fp=None, nsrdb_agg=1, nsrdb_smoothing=0, scale_clearsky_ghi=True, load_features='all', res_kwargs=None, chunks='auto', target=None, shape=None, time_slice=slice(None, None, None), threshold=None, time_roll=0, time_shift=None, hr_spatial_coarsen=1, nan_method_kwargs=None, feature_aliases=None, BaseLoader=None, FeatureRegistry=None, interp_kwargs=None, cache_kwargs=None, **kwargs)[source]#

Bases: DataHandler

Extended NETCDF data handler. This implements a rasterizer hook to add “clearsky_ghi” to the rasterized data if “clearsky_ghi” is requested.

Parameters:

file_paths (str | list | pathlib.Path) – file_paths input to LoaderClass
features (list | str) – Features to derive. If ‘all’ then all available raw features will just be loaded. Specify explicit feature names for derivations.
nsrdb_source_fp (str | None) – Optional NSRDB source h5 file to retrieve clearsky_ghi from to calculate CC clearsky_ratio along with rsds (ghi) from the CC netcdf file.
nsrdb_agg (int) – Optional number of NSRDB source pixels to aggregate clearsky_ghi from to a single climate change netcdf pixel. This can be used if the CC.nc data is at a much coarser resolution than the source nsrdb data.
nsrdb_smoothing (float) – Optional gaussian filter smoothing factor to smooth out clearsky_ghi from high-resolution nsrdb source data. This is typically done because spatially aggregated nsrdb data is still usually rougher than CC irradiance data.
scale_clearsky_ghi (bool) – Flag to scale the NSRDB clearsky ghi so that the maximum value matches the GCM rsds maximum value per spatial pixel. This is useful when calculating “clearsky_ratio” so that there are not an unrealistic number of 1 values if the maximum NSRDB clearsky_ghi is much lower than the GCM values
kwargs (dict) – Dictionary of additional keyword args for Rasterizer, used specifically for rasterizing flattened data
load_features (list | str) – Features to load and make available for derivations. If ‘all’ then all available raw features will be loaded and made available for derivations. This can be used to restrict features used for derivations. For example, to derive ‘temperature_100m’ from only temperature isobars, from data that includes single level values as well (like temperature_2m), don’t include ‘temperature_2m’ in the load_features list.
res_kwargs (dict) – Additional keyword arguments passed through to the BaseLoader. BaseLoader is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.
chunks (dict | str) – Dictionary of chunk sizes to pass through to dask.array.from_array() or xr.Dataset().chunk(). Will be converted to a tuple when used in from_array(). These are the methods for H5 and NETCDF data, respectively. This argument can be “auto” or None in addition to a dictionary. None will not do any chunking and load data into memory as np.array
target (tuple) – (lat, lon) lower left corner of raster. Either need target+shape or raster_file.
shape (tuple) – (rows, cols) grid size. Either need target+shape or raster_file.
time_slice (slice | list) – Slice specifying extent and step of temporal extraction. e.g. slice(start, stop, step). If equal to slice(None, None, 1) the full time dimension is selected. Can be also be a list [start, stop, step]
threshold (float) – Nearest neighbor euclidean distance threshold. If the coordinates are more than this value away from the target lat/lon, an error is raised.
time_roll (int) – Number of steps to roll along the time axis. Passed to xr.Dataset.roll()
time_shift (int | None) – Number of minutes to shift time axis. This can be used, for example, to shift the time index for daily data so that the time stamp for a given day starts at the zeroth minute instead of at noon, as is the case for most GCM data.
hr_spatial_coarsen (int) – Spatial coarsening factor. Passed to xr.Dataset.coarsen()
nan_method_kwargs (str | dict | None) – Keyword arguments for nan handling. If ‘mask’, time steps with nans will be dropped. Otherwise this should be a dict of kwargs which will be passed to sup3r.preprocessing.accessor.Sup3rX.interpolate_na(). e.g. {‘method’: ‘linear’, ‘dim’: ‘time’}
feature_aliases (dict) – Optional dictionary of feature aliases to use when loading data. This is useful for renaming features to expected sup3r names. For example, {‘sp’: ‘pressure_0m’, ‘u10’: u_10m’}.
BaseLoader (Callable) – Base level file loader wrapped by Loader. This is usually xr.open_mfdataset for NETCDF files and MultiFileResourceX for H5 files.
FeatureRegistry (dict) – Dictionary of DerivedFeature objects used for derivations
interp_kwargs (dict | None) – Dictionary of kwargs for level interpolation. Can include “method” and “run_level_check” keys. Method specifies how to perform height interpolation. e.g. Deriving u_20m from u_10m and u_100m. Options are “linear” and “log”. See sup3r.preprocessing.derivers.Deriver.do_level_interpolation()
cache_kwargs (dict | None) – Dictionary with kwargs for caching wrangled data. This should at minimum include a cache_pattern key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type. See class:Cacher for description of more arguments.

Methods

`check_registry`(feature)	Get compute method from the registry if available.
`collect_input_attrs`(feature[, inputs])	Collect attributes from the input features for the given feature.
`derive`(feature)	Routine to derive requested features.
`do_level_interpolation`(feature[, interp_kwargs])	Interpolate over height or pressure to derive the given feature.
`get_clearsky_ghi`()	Get clearsky ghi from an exogenous NSRDB source h5 file at the target CC meta data and time index.
`get_inputs`(feature)	Get inputs for the given feature and inputs for those inputs.
`get_multi_level_data`(feature)	Get data stored in multi-level arrays, like u stored on pressure levels.
`get_single_level_data`(feature)	When doing level interpolation we should include the single level data available.
`get_time_slice`(ti_nsrdb)	Get nsrdb data time slice consistent with self.time_index.
`has_interp_variables`(feature)	Check if the given feature can be interpolated from values at nearby heights or from pressure level data.
`map_new_name`(feature, pattern)	If the search for a derivation method first finds an alternative name for the feature we want to derive, by matching a wildcard pattern, we need to replace the wildcard with the specific height or pressure we want and continue the search for a derivation method with this new name.
`no_overlap`(feature)	Check if any of the nested inputs for 'feature' contain 'feature'
`post_init_log`([args_dict])	Log additional arguments after initialization.
`run_input_checks`()	Run checks on the files provided for extracting clearsky_ghi.
`run_wrap_checks`(cs_ghi)	Run check on rasterized data from clearsky_ghi source.
`scale_clearsky_ghi`()	Method to scale the NSRDB clearsky ghi so that the maximum value matches the GCM rsds maximum value per spatial pixel.
`wrap`(data)	Return a `Sup3rDataset` object or tuple of such.

Attributes

`FEATURE_REGISTRY`
`data`	Return underlying data.
`shape`	Get shape of underlying data.

run_input_checks()[source]#: Run checks on the files provided for extracting clearsky_ghi. Make sure the loaded data is daily data and the step size is one day.

run_wrap_checks(cs_ghi)[source]#: Run check on rasterized data from clearsky_ghi source.

get_time_slice(ti_nsrdb)[source]#: Get nsrdb data time slice consistent with self.time_index.

get_clearsky_ghi()[source]#

Get clearsky ghi from an exogenous NSRDB source h5 file at the target CC meta data and time index.

TODO: Replace some of this with call to Regridder? Perform daily means with self.loader.coarsen?

Returns:: cs_ghi (Union[np.ndarray, da.core.Array]) – Clearsky ghi (W/m2) from the nsrdb_source_fp h5 source file. Data shape is (lat, lon, time) where time is daily average values.

scale_clearsky_ghi()[source]#: Method to scale the NSRDB clearsky ghi so that the maximum value matches the GCM rsds maximum value per spatial pixel. This is useful when calculating “clearsky_ratio” so that there are not an unrealistic number of 1 values if the maximum NSRDB clearsky_ghi is much lower than the GCM values

check_registry(feature) → ndarray | Array | str | None#: Get compute method from the registry if available. Will check for pattern feature match in feature registry. e.g. if u_100m matches a feature registry entry of u_(.*)m

collect_input_attrs(feature, inputs=None)#: Collect attributes from the input features for the given feature.

property data#

Return underlying data.

Returns:: Sup3rDataset

sup3r.preprocessing.data_handlers.nc_cc.DataHandlerNCforCC

Contents

sup3r.preprocessing.data_handlers.nc_cc.DataHandlerNCforCC#