rex.rechunk_h5.rechunk_h5.RechunkH5
- class RechunkH5(h5_src, h5_dst, var_attrs=None, hub_height=None, chunk_size=2, weeks_per_chunk=None, overwrite=True)[source]
Bases:
object
Class to create new .h5 file with new chunking
Warning
This code does not currently support re-chunking H5 files with grouped datasets.
Initalize class object
- Parameters:
h5_src (str) – Source .h5 file path
h5_dst (str) – Destination path for rechunked .h5 file
var_attrs (str | pandas.DataFrame, optional) – DataFrame of variable attributes or .json containing variable attributes, by default None
hub_height (int | None, optional) – Rechunk specific hub_height, by default None
chunk_size (int, optional) – Chunk size in MB, by default 2
weeks_per_chunk (int, optional) – Number of weeks per time chunk, if None scale weeks based on 8 weeks for hourly data, by default None
overwrite (bool, optional) – Flag to overwrite an existing h5_dst file, by default True
Methods
check_dset_attrs
(ds_in, dset_attrs[, ...])Check dataset attributes (dtype, scale_factor, units) against source Dataset
close
()Close h5 instance
init_dset
(dset_name, dset_shape, dset_attrs)Create dataset and add attributes and load data if needed
load_coords
(attrs)Create coordinates and add to rechunked .h5
load_data
(ds_in, ds_out, shape, dset_attrs)Load data from ds_in to ds_out
load_dset
(dset_name, dset_attrs[, ...])Transfer dataset from domain to combined .h5
load_meta
(attrs[, meta_path])Transfer meta data to rechunked .h5
load_time_index
(attrs[, resolution])Transfer time_index to rechunked .h5
rechunk
([meta, process_size, ...])Rechunk all variables in given variable attributes json
run
(h5_src, h5_dst[, var_attrs, hub_height, ...])Rechunk h5_src to h5_dst using given attributes
Attributes
NON_TS_DSETS
Coordinates attributes
Datasets available in h5_file
Global attributes
Meta attributes
Attributes for rechunked files, includes dataset and global attrs
Available dsets in source .h5
Time index attributes
Time slice or mask to use for rechunking temporal access
Variable attributes
- property src_dsets
Available dsets in source .h5
- Returns:
list
- property dsets
Datasets available in h5_file
- Returns:
list – List of datasets in h5_file
- property time_slice
Time slice or mask to use for rechunking temporal access
- Returns:
slice
- property rechunk_attrs
Attributes for rechunked files, includes dataset and global attrs
- Returns:
pandas.DataFrame
- property global_attrs
Global attributes
- Returns:
pandas.Series
- property time_index_attrs
Time index attributes
- Returns:
pandas.Series
- property meta_attrs
Meta attributes
- Returns:
pandas.Series
- property coordinates_attrs
Coordinates attributes
- Returns:
pandas.Series
- property variable_attrs
Variable attributes
- Returns:
pandas.Series
- classmethod check_dset_attrs(ds_in, dset_attrs, check_attrs=False)[source]
Check dataset attributes (dtype, scale_factor, units) against source Dataset
- Parameters:
ds_in (h5py.Dataset) – Source h5 Dataset
dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunk, attrs)
check_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False
- init_dset(dset_name, dset_shape, dset_attrs)[source]
Create dataset and add attributes and load data if needed
- Parameters:
dset_name (str) – Dataset name to be created
dset_shape (tuple) – Dataset shape
dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunks, attrs, name)
- Returns:
ds (h5py.Dataset) – Initalized h5py Dataset instance
- load_time_index(attrs, resolution=None)[source]
Transfer time_index to rechunked .h5
- Parameters:
attrs (pandas.Series) – Dataset attributes associated with time_index
resolution (str, optional) – New time resolution, by default None
- load_meta(attrs, meta_path=None)[source]
Transfer meta data to rechunked .h5
- Parameters:
attrs (pandas.Series) – Dataset attributes associated with meta
- load_coords(attrs)[source]
Create coordinates and add to rechunked .h5
- Parameters:
attrs (pandas.Series) – Dataset attributes associated with coordinates
- load_data(ds_in, ds_out, shape, dset_attrs, process_size=None, data=None, reduce=False)[source]
Load data from ds_in to ds_out
- Parameters:
ds_in (h5py.Dataset) – Open dataset instance for source data
ds_out (h5py.Dataset) – Open dataset instance for rechunked data
shape (tuple) – Dataset shape
dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunks, attrs)
process_size (int, optional) – Size of each chunk to be processed at a time, by default None
data (ndarray, optional) – Data to load into ds_out, by default None
reduce (bool, optional) – Reduce temporal resolution, by default False
- load_dset(dset_name, dset_attrs, process_size=None, check_attrs=False)[source]
Transfer dataset from domain to combined .h5
- Parameters:
dset_name (str) – Dataset to transfer
dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunks, attrs)
process_size (int, optional) – Size of each chunk to be processed at a time, by default None
check_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False
- rechunk(meta=None, process_size=None, check_dset_attrs=False, resolution=None)[source]
Rechunk all variables in given variable attributes json
- Parameters:
meta (str, optional) – Path to .csv or .npy file containing meta to load into rechunked .h5 file, by default None
process_size (int, optional) – Size of each chunk to be processed at a time, by default None
check_dset_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False
resolution (str, optional) – New time resolution, by default None
- classmethod run(h5_src, h5_dst, var_attrs=None, hub_height=None, chunk_size=2, weeks_per_chunk=None, overwrite=True, meta=None, process_size=None, check_dset_attrs=False, resolution=None)[source]
Rechunk h5_src to h5_dst using given attributes
- Parameters:
h5_src (str) – Source .h5 file path
h5_dst (str) – Destination path for rechunked .h5 file
var_attrs (str | pandas.DataFrame) – DataFrame of variable attributes or .json containing variable attributes
hub_height (int | None, optional) – Rechunk specific hub_height, by default None
chunk_size (int, optional) – Chunk size in MB, by default 2
weeks_per_chunk (int, optional) – Number of weeks per time chunk, if None scale weeks based on 8 weeks for hourly data, by default None
overwrite (bool, optional) – Flag to overwrite an existing h5_dst file, by default True
meta (str, optional) – Path to .csv or .npy file containing meta to load into rechunked .h5 file, by default None
process_size (int, optional) – Size of each chunk to be processed at a time, by default None
check_dset_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False
resolution (str, optional) – New time resolution, by default None