sup3r.writers.cachers.Cacher#

class Cacher(data: Sup3rX | Sup3rDataset, cache_kwargs: dict | None = None)[source]#

Bases: Container

Base cacher object. Simply writes given data to H5 or NETCDF files. By default every feature will be written to a separate file. To write multiple features to the same file call write_netcdf() or write_h5() directly

Parameters:
  • data (Union[Sup3rX, Sup3rDataset]) – Data to write to file

  • cache_kwargs (dict) – Dictionary with kwargs for caching wrangled data. This should at minimum include a ‘cache_pattern’ key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type.

    Can also include a max_workers, chunks, and attrs keys. max_workers is an inteeger specifying number of threads to use for writing chunks to output files, chunks is a dictionary of dictionaries for each feature (or a single dictionary to use for all features), and attrs is a dictionary of attributes to add to the output files which can include scale_factor values. e.g. .. code-block:: JSON

    {‘cache_pattern’: …,
    ‘chunks’: {
    ‘u_10m’: {

    ‘time’: 20, ‘south_north’: 100, ‘west_east’: 100

    }

    } ‘attrs’: {

    ‘u_10m’: {‘scale_factor’: 10}

    }

    }

Note

This is only for saving cached data. If you want to reload the cached files load them with a Loader object. DataHandler objects can cache and reload from cache automatically.

Methods

cache_data(cache_pattern[, chunks, ...])

Cache data to file with file type based on user provided cache_pattern.

get_chunksizes(dset, data, chunks)

Get chunksizes after rechunking (could be undetermined beforehand if chunks == 'auto') and return rechunked data.

parse_chunks(feature, chunks, dims)

Parse chunks input to Cacher.

post_init_log([args_dict])

Log additional arguments after initialization.

wrap(data)

Return a Sup3rDataset object or tuple of such.

write_h5(out_file, data[, features, chunks, ...])

Cache data to h5 file using user provided chunks value.

write_netcdf(out_file, data[, features, ...])

Cache data to a netcdf file using xarray.

Attributes

data

Return underlying data.

shape

Get shape of underlying data.

cache_data(cache_pattern, chunks=None, max_workers=None, mode='w', attrs=None, time_last=None, overwrite=False)[source]#

Cache data to file with file type based on user provided cache_pattern.

Parameters:
  • cache_pattern (str) – Cache file pattern. Must have a {feature} format key. The extension (.h5 or .nc) specifies which format to use for caching.

  • chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g. {'u_10m': {'time': 10, 'south_north': 100, 'west_east': 100}}

  • max_workers (int | None) – Number of workers to use for parallel writing of chunks

  • mode (str) – Write mode for out_file. Defaults to write.

  • attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}

  • time_last (bool) – Whether to keep the original dimension order of the data. If False then the data will be transposed to have the time dimension first. This defaults to False for H5 output and True for NetCDF output.

  • overwrite (bool) – Whether to overwrite existing cache files.

static parse_chunks(feature, chunks, dims)[source]#

Parse chunks input to Cacher. Needs to be a dictionary of dimensions and chunk values but parsed to a tuple for H5 caching.

classmethod get_chunksizes(dset, data, chunks)[source]#

Get chunksizes after rechunking (could be undetermined beforehand if chunks == 'auto') and return rechunked data.

Parameters:
  • dset (str) – Name of feature to get chunksizes for.

  • data (Sup3rX | xr.Dataset) – Sup3rX or xr.Dataset containing data to be cached.

  • chunks (dict | None | ‘auto’) – Dictionary of chunksizes either to use for all features or, if the dictionary includes feature keys, feature specific chunksizes. Can also be None or ‘auto’.

classmethod write_h5(out_file, data, features='all', chunks=None, max_workers=None, mode='w', attrs=None, time_last=False, write_coords=True)[source]#

Cache data to h5 file using user provided chunks value.

Parameters:
  • out_file (str) – Name of file to write. Must have a .h5 extension.

  • data (Sup3rDataset | Sup3rX | xr.Dataset) – Data to write to file. Comes from self.data, so an xr.Dataset like object with .dims and .coords

  • features (str | list) – Name of feature(s) to write to file.

  • chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g. {'u_10m': {'time': 10, 'south_north': 100, 'west_east': 100}}

  • max_workers (int | None) – Number of workers to use for parallel writing of chunks

  • mode (str) – Write mode for out_file. Defaults to write.

  • attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}. Can also include a global meta dataframe that will then be added to the coordinate meta.

  • time_last (bool) – Whether to keep time as the last dimension. If False then the data will be transposed to have the time dimension first.

  • write_coords (bool) – Whether to write coordinate datasets to file. If this is False only the requested features will be written. Specific coordinates can be written by including them in the features list.

classmethod write_netcdf(out_file, data, features='all', chunks=None, max_workers=1, mode='w', attrs=None, time_last=True)[source]#

Cache data to a netcdf file using xarray.

Parameters:
  • out_file (str) – Name of file to write. Must have a .nc extension.

  • data (Sup3rDataset) – Data to write to file. Comes from self.data, so a Sup3rDataset with coords attributes

  • features (str | list) – Names of feature(s) to write to file.

  • chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g. {'south_north': 100, 'west_east': 100, 'time': 10} Can also include dataset specific values. e.g. {'windspeed': {'south_north': 100, 'west_east': 100, 'time': 10}}

  • max_workers (int | None) – Number of workers to use for parallel writing of chunks

  • mode (str) – Write mode for out_file. Defaults to write.

  • attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}

  • time_last (bool) – Whether to keep time as the last dimension. If False then the data will be transposed to have the time dimension first.

property data#

Return underlying data.

Returns:

Sup3rDataset

See also

wrap()

post_init_log(args_dict=None)#

Log additional arguments after initialization.

property shape#

Get shape of underlying data.

wrap(data)#

Return a Sup3rDataset object or tuple of such. This is a tuple when the .data attribute belongs to a Collection object like BatchHandler. Otherwise this is Sup3rDataset object, which is either a wrapped 3-tuple, 2-tuple, or 1-tuple (e.g. len(data) == 3, len(data) == 2 or len(data) == 1). This is a 3-tuple when .data belongs to a container object like DualSamplerWithObs, a 2-tuple when .data belongs to a dual container object like DualSampler, and a 1-tuple otherwise.