sup3r.preprocessing.cachers.base.Cacher#

class Cacher(data: Sup3rX | Sup3rDataset, cache_kwargs: Dict | None = None)[source]#

Bases: Container

Base cacher object. Simply writes given data to H5 or NETCDF files. By default every feature will be written to a separate file. To write multiple features to the same file call write_netcdf() or write_h5() directly

Parameters:

data (Union[Sup3rX, Sup3rDataset]) – Data to write to file
cache_kwargs (dict) – Dictionary with kwargs for caching wrangled data. This should at minimum include a ‘cache_pattern’ key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type.

Can also include a max_workers key and chunks key. max_workers is an inteeger specifying number of threads to use for writing chunks to output files and chunks is a dictionary of dictionaries for each feature (or a single dictionary to use for all features). e.g. .. code-block:: JSON

{‘cache_pattern’: …,

‘chunks’: {

‘u_10m’: {
‘time’: 20, ‘south_north’: 100, ‘west_east’: 100

}

}

}

Note

This is only for saving cached data. If you want to reload the cached files load them with a Loader object. DataHandler objects can cache and reload from cache automatically.

Methods

`add_coord_meta`(out_file, data[, meta])	Add flattened coordinate meta to out_file.
`cache_data`(cache_pattern[, chunks, ...])	Cache data to file with file type based on user provided cache_pattern.
`get_chunk_slices`(chunks, shape)	Get slices used to write xarray data to netcdf file in chunks.
`get_chunksizes`(dset, data, chunks)	Get chunksizes after rechunking (could be undetermined beforehand if `chunks == 'auto'`) and return rechunked data.
`parse_chunks`(feature, chunks, dims)	Parse chunks input to Cacher.
`post_init_log`([args_dict])	Log additional arguments after initialization.
`wrap`(data)	Return a `Sup3rDataset` object or tuple of such.
`write_chunk`(out_file, dset, chunk_slice, ...)	Add chunk to netcdf file.
`write_h5`(out_file, data[, features, chunks, ...])	Cache data to h5 file using user provided chunks value.
`write_netcdf`(out_file, data[, features, ...])	Cache data to a netcdf file.
`write_netcdf_chunks`(out_file, feature, data)	Write netcdf chunks with delayed dask tasks.

Attributes

`data`	Return underlying data.
`shape`	Get shape of underlying data.

cache_data(cache_pattern, chunks=None, max_workers=None, mode='w', attrs=None, verbose=False)[source]#

Cache data to file with file type based on user provided cache_pattern.

Parameters:

cache_pattern (str) – Cache file pattern. Must have a {feature} format key. The extension (.h5 or .nc) specifies which format to use for caching.
chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g. {'u_10m': {'time': 10, 'south_north': 100, 'west_east': 100}}
max_workers (int | None) – Number of workers to use for parallel writing of chunks
mode (str) – Write mode for out_file. Defaults to write.
attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}
verbose (bool) – Whether to log progress for each chunk written to output files.

static parse_chunks(feature, chunks, dims)[source]#: Parse chunks input to Cacher. Needs to be a dictionary of dimensions and chunk values but parsed to a tuple for H5 caching.

classmethod get_chunksizes(dset, data, chunks)[source]#

Get chunksizes after rechunking (could be undetermined beforehand if chunks == 'auto') and return rechunked data.

Parameters:

dset (str) – Name of feature to get chunksizes for.
data (Sup3rX | xr.Dataset) – Sup3rX or xr.Dataset containing data to be cached.
chunks (dict | None | ‘auto’) – Dictionary of chunksizes either to use for all features or, if the dictionary includes feature keys, feature specific chunksizes. Can also be None or ‘auto’.

classmethod add_coord_meta(out_file, data, meta=None)[source]#

Add flattened coordinate meta to out_file. This is used for h5 caching.

Parameters:

out_file (str) – Name of output file.
data (Sup3rX | xr.Dataset) – Data being written to the given out_file.
meta (pd.DataFrame | None) – Optional additional meta information to be written to the given out_file. If this is None then only coordinate info will be included in the meta written to the out_file

classmethod write_h5(out_file, data, features='all', chunks=None, max_workers=None, mode='w', attrs=None, verbose=False)[source]#

Cache data to h5 file using user provided chunks value.

Parameters:

out_file (str) – Name of file to write. Must have a .h5 extension.
data (Sup3rDataset | Sup3rX | xr.Dataset) – Data to write to file. Comes from self.data, so an xr.Dataset like object with .dims and .coords
features (str | list) – Name of feature(s) to write to file.
chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g. {'u_10m': {'time': 10, 'south_north': 100, 'west_east': 100}}
max_workers (int | None) – Number of workers to use for parallel writing of chunks
mode (str) – Write mode for out_file. Defaults to write.
attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}. Can also include a global meta dataframe that will then be added to the coordinate meta.
verbose (bool) – Dummy arg to match write_netcdf signature

static get_chunk_slices(chunks, shape)[source]#: Get slices used to write xarray data to netcdf file in chunks.

static write_chunk(out_file, dset, chunk_slice, chunk_data, msg=None)[source]#: Add chunk to netcdf file.

classmethod write_netcdf_chunks(out_file, feature, data, chunks=None, max_workers=None, verbose=False)[source]#: Write netcdf chunks with delayed dask tasks.

classmethod write_netcdf(out_file, data, features='all', chunks=None, max_workers=None, mode='w', attrs=None, verbose=False)[source]#

Cache data to a netcdf file.

Parameters:

out_file (str) – Name of file to write. Must have a .nc extension.
data (Sup3rDataset) – Data to write to file. Comes from self.data, so a Sup3rDataset with coords attributes
features (str | list) – Names of feature(s) to write to file.
chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g. {'south_north': 100, 'west_east': 100, 'time': 10} Can also include dataset specific values. e.g. {'windspeed': {'south_north': 100, 'west_east': 100, 'time': 10}}
max_workers (int | None) – Number of workers to use for parallel writing of chunks
mode (str) – Write mode for out_file. Defaults to write.
attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}
verbose (bool) – Whether to log output after each chunk is written.

property data#

Return underlying data.

Returns:: Sup3rDataset

sup3r.preprocessing.cachers.base.Cacher

Contents

sup3r.preprocessing.cachers.base.Cacher#