reVX.utilities.reeds_cols.add_extra_data

add_extra_data(data_frame, extra_data, merge_col='sc_point_gid')[source]

Add extra data to a Pandas DataFrame from a list of input files.

Parameters:
  • data_frame (pandas.DataFrame) – A pandas data frame with initial data. Must have merge_col column if extracting data from HDF5 files.

  • extra_data (list of dicts) – A list of dictionaries, where each dictionary contains two keys. The first key is “source”, and its value must either be a dictionary of field: value pairs or a path to the extra data being extracted. The latter must be a path pointing to an HDF5 or JSON file (i.e. it must end in “.h5” or “.json”). The second key is “dsets”, and it points to a list of dataset names to extract from source. For JSON and dictionary data extraction, the values of the datasets must either be scalars or must match the length of the input data_frame. For HDF5 data, the datasets must be 1D datasets, and they will be merged with the input data_frame on merge_col (column must be in the HDF5 file meta). By default, None.

  • merge_col (str, optional) – Name of column used to merge the data in the input data_frame with the data in the HDF5 file. Note that this column must be present in both the data_frame as well as the HDF5 file meta.

Returns:

pandas.DataFrame – A pandas data frame with extra data added from input files.