sup3r.writers.cachers.Cacher#
- class Cacher(data: Sup3rX | Sup3rDataset, cache_kwargs: dict | None = None)[source]#
Bases:
ContainerBase cacher object. Simply writes given data to H5 or NETCDF files. By default every feature will be written to a separate file. To write multiple features to the same file call
write_netcdf()orwrite_h5()directly- Parameters:
data (Union[Sup3rX, Sup3rDataset]) – Data to write to file
cache_kwargs (dict) – Dictionary with kwargs for caching wrangled data. This should at minimum include a ‘cache_pattern’ key, value. This pattern must have a {feature} format key and either a h5 or nc file extension, based on desired output type.
Can also include a
max_workers,chunks, andattrskeys.max_workersis an inteeger specifying number of threads to use for writing chunks to output files,chunksis a dictionary of dictionaries for each feature (or a single dictionary to use for all features), andattrsis a dictionary of attributes to add to the output files which can includescale_factorvalues. e.g. .. code-block:: JSON- {‘cache_pattern’: …,
- ‘chunks’: {
- ‘u_10m’: {
‘time’: 20, ‘south_north’: 100, ‘west_east’: 100
}
} ‘attrs’: {
‘u_10m’: {‘scale_factor’: 10}
}
}
Note
This is only for saving cached data. If you want to reload the cached files load them with a
Loaderobject.DataHandlerobjects can cache and reload from cache automatically.Methods
cache_data(cache_pattern[, chunks, ...])Cache data to file with file type based on user provided cache_pattern.
get_chunksizes(dset, data, chunks)Get chunksizes after rechunking (could be undetermined beforehand if
chunks == 'auto') and return rechunked data.parse_chunks(feature, chunks, dims)Parse chunks input to Cacher.
post_init_log([args_dict])Log additional arguments after initialization.
wrap(data)Return a
Sup3rDatasetobject or tuple of such.write_h5(out_file, data[, features, chunks, ...])Cache data to h5 file using user provided chunks value.
write_netcdf(out_file, data[, features, ...])Cache data to a netcdf file using xarray.
Attributes
- cache_data(cache_pattern, chunks=None, max_workers=None, mode='w', attrs=None, time_last=None, overwrite=False)[source]#
Cache data to file with file type based on user provided cache_pattern.
- Parameters:
cache_pattern (str) – Cache file pattern. Must have a {feature} format key. The extension (.h5 or .nc) specifies which format to use for caching.
chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g.
{'u_10m': {'time': 10, 'south_north': 100, 'west_east': 100}}max_workers (int | None) – Number of workers to use for parallel writing of chunks
mode (str) – Write mode for
out_file. Defaults to write.attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}
time_last (bool) – Whether to keep the original dimension order of the data. If
Falsethen the data will be transposed to have the time dimension first. This defaults to False for H5 output and True for NetCDF output.overwrite (bool) – Whether to overwrite existing cache files.
- static parse_chunks(feature, chunks, dims)[source]#
Parse chunks input to Cacher. Needs to be a dictionary of dimensions and chunk values but parsed to a tuple for H5 caching.
- classmethod get_chunksizes(dset, data, chunks)[source]#
Get chunksizes after rechunking (could be undetermined beforehand if
chunks == 'auto') and return rechunked data.- Parameters:
dset (str) – Name of feature to get chunksizes for.
data (Sup3rX | xr.Dataset) –
Sup3rXorxr.Datasetcontaining data to be cached.chunks (dict | None | ‘auto’) – Dictionary of chunksizes either to use for all features or, if the dictionary includes feature keys, feature specific chunksizes. Can also be None or ‘auto’.
- classmethod write_h5(out_file, data, features='all', chunks=None, max_workers=None, mode='w', attrs=None, time_last=False, write_coords=True)[source]#
Cache data to h5 file using user provided chunks value.
- Parameters:
out_file (str) – Name of file to write. Must have a .h5 extension.
data (Sup3rDataset | Sup3rX | xr.Dataset) – Data to write to file. Comes from
self.data, so anxr.Datasetlike object with.dimsand.coordsfeatures (str | list) – Name of feature(s) to write to file.
chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g.
{'u_10m': {'time': 10, 'south_north': 100, 'west_east': 100}}max_workers (int | None) – Number of workers to use for parallel writing of chunks
mode (str) – Write mode for
out_file. Defaults to write.attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}. Can also include a global meta dataframe that will then be added to the coordinate meta.
time_last (bool) – Whether to keep time as the last dimension. If
Falsethen the data will be transposed to have the time dimension first.write_coords (bool) – Whether to write coordinate datasets to file. If this is False only the requested features will be written. Specific coordinates can be written by including them in the features list.
- classmethod write_netcdf(out_file, data, features='all', chunks=None, max_workers=1, mode='w', attrs=None, time_last=True)[source]#
Cache data to a netcdf file using xarray.
- Parameters:
out_file (str) – Name of file to write. Must have a
.ncextension.data (Sup3rDataset) – Data to write to file. Comes from
self.data, so aSup3rDatasetwith coords attributesfeatures (str | list) – Names of feature(s) to write to file.
chunks (dict | None) – Chunk sizes for coordinate dimensions. e.g.
{'south_north': 100, 'west_east': 100, 'time': 10}Can also include dataset specific values. e.g.{'windspeed': {'south_north': 100, 'west_east': 100, 'time': 10}}max_workers (int | None) – Number of workers to use for parallel writing of chunks
mode (str) – Write mode for
out_file. Defaults to write.attrs (dict | None) – Optional attributes to write to file. Can specify dataset specific attributes by adding a dictionary with the dataset name as a key. e.g. {**global_attrs, dset: {…}}
time_last (bool) – Whether to keep time as the last dimension. If
Falsethen the data will be transposed to have the time dimension first.
- property data#
Return underlying data.
- Returns:
See also
- post_init_log(args_dict=None)#
Log additional arguments after initialization.
- property shape#
Get shape of underlying data.
- wrap(data)#
Return a
Sup3rDatasetobject or tuple of such. This is a tuple when the.dataattribute belongs to aCollectionobject likeBatchHandler. Otherwise this isSup3rDatasetobject, which is either a wrapped 3-tuple, 2-tuple, or 1-tuple (e.g.len(data) == 3,len(data) == 2orlen(data) == 1). This is a 3-tuple when.databelongs to a container object likeDualSamplerWithObs, a 2-tuple when.databelongs to a dual container object likeDualSampler, and a 1-tuple otherwise.