rex.rechunk_h5.rechunk_h5.RechunkH5

class RechunkH5(h5_src, h5_dst, var_attrs=None, hub_height=None, chunk_size=2, weeks_per_chunk=None, overwrite=True)[source]

Bases: object

Class to create new .h5 file with new chunking

Warning

This code does not currently support re-chunking H5 files with grouped datasets.

Initalize class object

Parameters:
  • h5_src (str) – Source .h5 file path

  • h5_dst (str) – Destination path for rechunked .h5 file

  • var_attrs (str | pandas.DataFrame, optional) – DataFrame of variable attributes or .json containing variable attributes, by default None

  • hub_height (int | None, optional) – Rechunk specific hub_height, by default None

  • chunk_size (int, optional) – Chunk size in MB, by default 2

  • weeks_per_chunk (int, optional) – Number of weeks per time chunk, if None scale weeks based on 8 weeks for hourly data, by default None

  • overwrite (bool, optional) – Flag to overwrite an existing h5_dst file, by default True

Methods

check_dset_attrs(ds_in, dset_attrs[, ...])

Check dataset attributes (dtype, scale_factor, units) against source Dataset

close()

Close h5 instance

init_dset(dset_name, dset_shape, dset_attrs)

Create dataset and add attributes and load data if needed

load_coords(attrs)

Create coordinates and add to rechunked .h5

load_data(ds_in, ds_out, shape, dset_attrs)

Load data from ds_in to ds_out

load_dset(dset_name, dset_attrs[, ...])

Transfer dataset from domain to combined .h5

load_meta(attrs[, meta_path])

Transfer meta data to rechunked .h5

load_time_index(attrs[, resolution])

Transfer time_index to rechunked .h5

rechunk([meta, process_size, ...])

Rechunk all variables in given variable attributes json

run(h5_src, h5_dst[, var_attrs, hub_height, ...])

Rechunk h5_src to h5_dst using given attributes

Attributes

NON_TS_DSETS

coordinates_attrs

Coordinates attributes

dsets

Datasets available in h5_file

global_attrs

Global attributes

meta_attrs

Meta attributes

rechunk_attrs

Attributes for rechunked files, includes dataset and global attrs

src_dsets

Available dsets in source .h5

time_index_attrs

Time index attributes

time_slice

Time slice or mask to use for rechunking temporal access

variable_attrs

Variable attributes

close()[source]

Close h5 instance

property src_dsets

Available dsets in source .h5

Returns:

list

property dsets

Datasets available in h5_file

Returns:

list – List of datasets in h5_file

property time_slice

Time slice or mask to use for rechunking temporal access

Returns:

slice

property rechunk_attrs

Attributes for rechunked files, includes dataset and global attrs

Returns:

pandas.DataFrame

property global_attrs

Global attributes

Returns:

pandas.Series

property time_index_attrs

Time index attributes

Returns:

pandas.Series

property meta_attrs

Meta attributes

Returns:

pandas.Series

property coordinates_attrs

Coordinates attributes

Returns:

pandas.Series

property variable_attrs

Variable attributes

Returns:

pandas.Series

classmethod check_dset_attrs(ds_in, dset_attrs, check_attrs=False)[source]

Check dataset attributes (dtype, scale_factor, units) against source Dataset

Parameters:
  • ds_in (h5py.Dataset) – Source h5 Dataset

  • dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunk, attrs)

  • check_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False

init_dset(dset_name, dset_shape, dset_attrs)[source]

Create dataset and add attributes and load data if needed

Parameters:
  • dset_name (str) – Dataset name to be created

  • dset_shape (tuple) – Dataset shape

  • dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunks, attrs, name)

Returns:

ds (h5py.Dataset) – Initalized h5py Dataset instance

load_time_index(attrs, resolution=None)[source]

Transfer time_index to rechunked .h5

Parameters:
  • attrs (pandas.Series) – Dataset attributes associated with time_index

  • resolution (str, optional) – New time resolution, by default None

load_meta(attrs, meta_path=None)[source]

Transfer meta data to rechunked .h5

Parameters:

attrs (pandas.Series) – Dataset attributes associated with meta

load_coords(attrs)[source]

Create coordinates and add to rechunked .h5

Parameters:

attrs (pandas.Series) – Dataset attributes associated with coordinates

load_data(ds_in, ds_out, shape, dset_attrs, process_size=None, data=None, reduce=False)[source]

Load data from ds_in to ds_out

Parameters:
  • ds_in (h5py.Dataset) – Open dataset instance for source data

  • ds_out (h5py.Dataset) – Open dataset instance for rechunked data

  • shape (tuple) – Dataset shape

  • dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunks, attrs)

  • process_size (int, optional) – Size of each chunk to be processed at a time, by default None

  • data (ndarray, optional) – Data to load into ds_out, by default None

  • reduce (bool, optional) – Reduce temporal resolution, by default False

load_dset(dset_name, dset_attrs, process_size=None, check_attrs=False)[source]

Transfer dataset from domain to combined .h5

Parameters:
  • dset_name (str) – Dataset to transfer

  • dset_attrs (dict) – Dictionary of dataset attributes (dtype, chunks, attrs)

  • process_size (int, optional) – Size of each chunk to be processed at a time, by default None

  • check_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False

rechunk(meta=None, process_size=None, check_dset_attrs=False, resolution=None)[source]

Rechunk all variables in given variable attributes json

Parameters:
  • meta (str, optional) – Path to .csv or .npy file containing meta to load into rechunked .h5 file, by default None

  • process_size (int, optional) – Size of each chunk to be processed at a time, by default None

  • check_dset_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False

  • resolution (str, optional) – New time resolution, by default None

classmethod run(h5_src, h5_dst, var_attrs=None, hub_height=None, chunk_size=2, weeks_per_chunk=None, overwrite=True, meta=None, process_size=None, check_dset_attrs=False, resolution=None)[source]

Rechunk h5_src to h5_dst using given attributes

Parameters:
  • h5_src (str) – Source .h5 file path

  • h5_dst (str) – Destination path for rechunked .h5 file

  • var_attrs (str | pandas.DataFrame) – DataFrame of variable attributes or .json containing variable attributes

  • hub_height (int | None, optional) – Rechunk specific hub_height, by default None

  • chunk_size (int, optional) – Chunk size in MB, by default 2

  • weeks_per_chunk (int, optional) – Number of weeks per time chunk, if None scale weeks based on 8 weeks for hourly data, by default None

  • overwrite (bool, optional) – Flag to overwrite an existing h5_dst file, by default True

  • meta (str, optional) – Path to .csv or .npy file containing meta to load into rechunked .h5 file, by default None

  • process_size (int, optional) – Size of each chunk to be processed at a time, by default None

  • check_dset_attrs (bool, optional) – Flag to compare source and specified dataset attributes, by default False

  • resolution (str, optional) – New time resolution, by default None