reV.qa_qc.summary.SummarizeH5

class SummarizeH5(h5_file, group=None)[source]

Bases: object

reV Summary data for QA/QC

Parameters:
  • h5_file (str) – Path to .h5 file to summarize data from

  • group (str, optional) – Group within h5_file to summarize datasets for, by default None

Methods

run(h5_file, out_dir[, group, dsets, ...])

Summarize all datasets in h5_file and dump to out_dir

summarize_dset(ds_name[, process_size, ...])

Compute dataset summary.

summarize_means([out_path])

Add means datasets to meta data

Attributes

h5_file

.h5 file path

property h5_file

.h5 file path

Returns:

str

summarize_dset(ds_name, process_size=None, max_workers=None, out_path=None)[source]

Compute dataset summary. If dataset is 2D compute temporal statistics for each site

Parameters:
  • ds_name (str) – Dataset name of interest

  • process_size (int, optional) – Number of sites to process at a time, by default None

  • max_workers (int, optional) – Number of workers to use in parallel, if 1 run in serial, if None use all available cores, by default None

  • out_path (str) – File path to save summary to

Returns:

summary (pandas.DataFrame) – Summary summary for dataset

summarize_means(out_path=None)[source]

Add means datasets to meta data

Parameters:

out_path (str, optional) – Path to .csv file to save update meta data to, by default None

Returns:

meta (pandas.DataFrame) – Meta data with means datasets added

classmethod run(h5_file, out_dir, group=None, dsets=None, process_size=None, max_workers=None)[source]

Summarize all datasets in h5_file and dump to out_dir

Parameters:
  • h5_file (str) – Path to .h5 file to summarize data from

  • out_dir (str) – Directory to dump summary .csv files to

  • group (str, optional) – Group within h5_file to summarize datasets for, by default None

  • dsets (str | list, optional) – Datasets to summarize, by default None

  • process_size (int, optional) – Number of sites to process at a time, by default None

  • max_workers (int, optional) – Number of workers to use when summarizing 2D datasets, by default None