nsrdb.data_model.clouds.CloudVarSingleH5
- class CloudVarSingleH5(fpath, pre_proc_flag=True, index=None, dsets=('cloud_type', 'cld_opd_dcomp', 'cld_reff_dcomp', 'cld_press_acha'), parallax_correct=True, solar_shading=True, remap_pc=True)[source]
Bases:
CloudVarSingle
Framework for .h5 single-file/single-timestep cloud data extraction.
- Parameters:
fpath (str) – Full filepath for the cloud data at a single timestep.
pre_proc_flag (bool) – Flag to pre-process and sparsify data.
index (np.ndarray) – Nearest neighbor results array to extract a subset of the data.
dsets (tuple | list) – Source datasets to extract.
parallax_correct (bool) – Flag to adjust cloud coordinates so clouds are overhead their coordinates and not at the apparent location from the sensor.
solar_shading (bool) – Flag to adjust cloud coordinates so clouds are assigned to the coordinates they shade.
remap_pc (bool) – Flag to remap the parallax-corrected and solar-shading-corrected data back onto the original semi-regular GOES coordinates
Methods
Try to clean unnecessary object attributes to reduce memory usage
correct_coordinates
(fpath, grid[, ...])Adjust grid lat/lon values based on solar position
get_dset
(dset)Get a single dataset from the source cloud data file.
make_sparse
(grid, raw_grid)Make the cloud grid sparse by removing NaN coordinates.
pre_process
(dset, data, attrs[, ...])Pre-process cloud data by filling missing values and unscaling.
Remap the parallax/shading corrected coordinates back onto the original "raw" coordinate system and set internal variables to do the same for the cloud data when processed through get_dset() and self.source_data
remap_pc_data
(data)Perform remapping of parallax/shading corrected data onto the raw/original cloud coordinate system including overlaying cloud shadow data over clear data.
Attributes
GRID_LABELS
Get a list of the available datasets in the cloud file.
Get the full file path for this cloud data timestep.
Return the cloud data grid for the current timestep.
Get multiple-variable data dictionary from the cloud data file.
Get the KDTree for the cloud data coordinates eg.
- property dsets
Get a list of the available datasets in the cloud file.
- classmethod correct_coordinates(fpath, grid, parallax_correct=True, solar_shading=True)[source]
Adjust grid lat/lon values based on solar position
- Parameters:
fpath (str) – Filepath to cloud h5 file containing required datasets for solpo coodinate adjustment.
grid (dict) – Dictionary with latitude and longitude keys and corresponding numpy array values.
parallax_correct (bool) – Flag to adjust cloud coordinates so clouds are overhead their coordinates and not at the apparent location from the sensor.
solar_shading (bool) – Flag to adjust cloud coordinates so clouds are assigned to the coordiantes they shade.
- Returns:
grid (dict) – Dictionary with latitude and longitude keys and corresponding numpy array values. Coordinates are adjusted for solar position so that clouds are linked to the coordinate that they are shading.
- static pre_process(dset, data, attrs, sparse_mask=None, index=None)[source]
Pre-process cloud data by filling missing values and unscaling.
- Pre-processing steps (different for .nc vs .h5):
flatten (ravel)
convert to float32 (unless dset == cloud_type)
convert filled values to NaN (unless dset == cloud_type)
apply scale factor (multiply)
apply add offset (addition)
sparsify
extract only data at index
- Parameters:
dset (str) – Dataset name.
data (np.ndarray) – Raw data extracted from the dataset in the cloud data source file.
attrs (dict) – Dataset attributes from the dataset in the cloud data source file.
sparse_mask (NoneType | pd.Series) – Optional boolean mask to apply to the data to sparsify.
index (np.ndarray) – Nearest neighbor results array to extract a subset of the data.
- Returns:
data (np.ndarray) – Pre-processed data.
- static make_sparse(grid, raw_grid)[source]
Make the cloud grid sparse by removing NaN coordinates.
- Parameters:
grid (pd.DataFrame) – GOES source coordinates (labels: [‘latitude’, ‘longitude’]).
raw_grid (pd.DataFrame | None) – Raw GOES source coordinates before parallax correction / solar shading or None if those algorithms are disabled.
- Returns:
grid (pd.DataFrame) – Sparse GOES source coordinates with all NaN rows removed.
raw_grid (pd.DataFrame | None) – Raw GOES source coordinates before parallax correction / solar shading or None if those algorithms are disabled.
mask (pd.Series) – Boolean series; the mask to extract sparse data.
- get_dset(dset)[source]
Get a single dataset from the source cloud data file.
- Parameters:
dset (str) – Variable dataset name to retrieve from the cloud file.
- Returns:
dset (np.ndarray) – 1D array of flattened data that should match the self.grid meta data.
- clean_attrs()
Try to clean unnecessary object attributes to reduce memory usage
- property fpath
Get the full file path for this cloud data timestep.
- property grid
Return the cloud data grid for the current timestep.
- Returns:
self._grid (pd.DataFrame | None) – GOES source coordinates (labels: [‘latitude’, ‘longitude’]). None if bad dataset
- remap_pc_coords()
Remap the parallax/shading corrected coordinates back onto the original “raw” coordinate system and set internal variables to do the same for the cloud data when processed through get_dset() and self.source_data
- remap_pc_data(data)
Perform remapping of parallax/shading corrected data onto the raw/original cloud coordinate system including overlaying cloud shadow data over clear data.
- Parameters:
data (np.ndarray) – 1D array of flattened data based on the original coordinate system ordering from the cloud file, possibly with sparsification due to pre processing of nan data/coordinates.
- Returns:
data (np.ndarray) – 1D array of flattened data that corresponds to the original coordinate system with no parallax/shading corrections but has been re-arranged such that it reflects these coordinate adjustments.
- property source_data
Get multiple-variable data dictionary from the cloud data file.
- Returns:
data (dict) – Dictionary of multiple cloud datasets. Keys are the cloud dataset names. Values are 1D (flattened/raveled) arrays of data.
- property tree
Get the KDTree for the cloud data coordinates eg. cKDTree(self.grid[[‘latitude’, ‘longitude’]])
- Returns:
cKDTree