flasc.data_processing.dataframe_manipulations#

Module containing methods for FLASC dataframe manipulations.

Functions

df_drop_nan_rows

Drop all-nan rows.

df_find_and_fill_data_gaps_with_missing

Find and fill data gap and mark as missing data with NaN.

df_reduce_precision

Reduce dataframe precision.

df_sort_and_find_duplicates

This function sorts the dataframe and finds rows with equal time index.

df_sort_and_fix_duplicates

Sort dataframe and fill duplicates.

filter_df_by_ti

Filter a dataframe by turbulence intensity range.

filter_df_by_wd

Filter a dataframe by wind direction range.

filter_df_by_ws

Filter a dataframe by wind speed range.

get_column_mean

Get the mean of a column for a list of turbines.

get_num_turbines

Get the number of turbines in a dataframe.

is_day_or_night

Determine night or day in dataframe.

plot_sun_altitude_with_day_night_color

Plot sun altitude with day-night color differentiation.

set_pow_ref_by_n_closest_upstream_turbines

Add pow_ref column using N-nearest upstream turbines.

set_pow_ref_by_turbines

Add power reference column by list of turbines.

set_pow_ref_by_upstream_turbines

Add pow_ref column using upstream turbines.

set_pow_ref_by_upstream_turbines_in_radius

Add pow_ref column using upstream turbines within a radius.

set_ti_by_all_turbines

Add TI column using all turbines.

set_ti_by_turbines

Add TI column by list of turbines.

set_ti_by_upstream_turbines

Add TI column using upstream turbines.

set_ti_by_upstream_turbines_in_radius

Add TI column by upstream turbines within a radius.

set_wd_by_all_turbines

Add a wind direction column using all turbines.

set_wd_by_radius_from_turbine

Add wind direction column by turbines in radius.

set_wd_by_turbines

Add WD column by list of turbines.

set_ws_by_all_turbines

Add ws column by all turbines.

set_ws_by_n_closest_upstream_turbines

Add wind speed column by N closest upstream turbines.

set_ws_by_turbines

Add ws column by list of turbines.

set_ws_by_upstream_turbines

Add wind speed column using upstream turbines.

set_ws_by_upstream_turbines_in_radius

Add wind speed column using in-radius upstream turbines.

flasc.data_processing.dataframe_manipulations.filter_df_by_ws(df, ws_range)[source]#

Filter a dataframe by wind speed range.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements.

  • ws_range ([float, float]) -- Wind speed range [lower bound, upper bound].

Returns:

Filtered dataframe.

Return type:

pd.DataFrame

flasc.data_processing.dataframe_manipulations.filter_df_by_wd(df, wd_range)[source]#

Filter a dataframe by wind direction range.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements.

  • wd_range ([float, float]) -- Wind direction range [lower bound, upper bound].

Returns:

Filtered dataframe.

Return type:

pd.DataFrame

flasc.data_processing.dataframe_manipulations.filter_df_by_ti(df, ti_range)[source]#

Filter a dataframe by turbulence intensity range.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements.

  • ti_range ([float, float]) -- Turbulence intensity range [lower bound, upper bound].

Returns:

Filtered dataframe.

Return type:

pd.DataFrame

flasc.data_processing.dataframe_manipulations.get_num_turbines(df)[source]#

Get the number of turbines in a dataframe.

Parameters:

df (pd.DataFrame) -- Dataframe with turbine data

Returns:

Number of turbines in the dataframe

Return type:

int

flasc.data_processing.dataframe_manipulations.get_column_mean(df, col_prefix='pow', turbine_list=None, circular_mean=False)[source]#

Get the mean of a column for a list of turbines.

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements.

  • col_prefix (str, optional) -- Column prefix to use. Defaults to "pow".

  • turbine_list ([list, array], optional) -- List of turbine numbers to use. If None, all turbines are used. Defaults to None.

  • circular_mean (bool, optional) -- Use circular mean. Defaults to False.

Returns:

Mean of the column for the specified turbines.

Return type:

np.array

flasc.data_processing.dataframe_manipulations._set_col_by_turbines(col_out, col_prefix, df, turbine_numbers, circular_mean)[source]#
flasc.data_processing.dataframe_manipulations._set_col_by_n_closest_upstream_turbines(col_out, col_prefix, df, N, df_upstream, circular_mean, turb_no, x_turbs, y_turbs, exclude_turbs=[])[source]#
flasc.data_processing.dataframe_manipulations._set_col_by_upstream_turbines(col_out, col_prefix, df, df_upstream, circular_mean, exclude_turbs=[])[source]#
flasc.data_processing.dataframe_manipulations._set_col_by_radius_from_turbine(col_out, col_prefix, df, turb_no, x_turbs, y_turbs, max_radius, circular_mean, include_itself=True)[source]#
flasc.data_processing.dataframe_manipulations._set_col_by_upstream_turbines_in_radius(col_out, col_prefix, df, df_upstream, turb_no, x_turbs, y_turbs, max_radius, circular_mean, include_itself=True)[source]#

Add a column of averaged upstream turbine values.

Add a column called [col_out] to your dataframe, which is the mean of the columns pow_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].

Parameters:
  • col_out (str) -- Column name to be added to the dataframe.

  • col_prefix (str) -- Column prefix to use.

  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.

  • turb_no (int) -- Turbine number from which the radius should be

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.

  • circular_mean (bool) -- Use circular mean. Defaults to False.

  • include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.

Returns:

Dataframe which equals the inserted dataframe plus the additional column called [col_ref].

Return type:

df (pd.Dataframe)

flasc.data_processing.dataframe_manipulations.set_wd_by_turbines(df, turbine_numbers)[source]#

Add WD column by list of turbines.

Add a column called 'wd' in your dataframe with value equal to the circular-averaged wind direction measurements of all the turbines in turbine_numbers.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'wd'.

Return type:

df (pd.DataFrame)

flasc.data_processing.dataframe_manipulations.set_wd_by_all_turbines(df)[source]#

Add a wind direction column using all turbines.

Add a column called 'wd' in your dataframe with value equal to the circular-averaged wind direction measurements of all turbines.

Parameters:

df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'wd'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_wd_by_radius_from_turbine(df, turb_no, x_turbs, y_turbs, max_radius, include_itself=True)[source]#

Add wind direction column by turbines in radius.

Add a column called 'wd' to your dataframe, which is the mean of the columns wd_%03d for turbines that are within radius [max_radius] of the turbine of interest [turb_no].

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • turb_no (int) -- Turbine number from which the radius should be calculated.

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.

  • include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'wd'.

Return type:

pd.DataFrame

flasc.data_processing.dataframe_manipulations.set_ws_by_turbines(df, turbine_numbers)[source]#

Add ws column by list of turbines.

Add a column called 'ws' in your dataframe with value equal to the mean wind speed measurements of all the turbines in turbine_numbers.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe

  • wd_%03d (typically consists of)

  • ws_%03d

  • ti_%03d

  • pow_%03d

  • and

  • measurements. (potentially additional)

  • turbine_numbers ([list, array]) -- List of turbine numbers that

  • average. (should be used to calculate the column)

Returns:

Dataframe which equals the inserted dataframe plus the additional column called 'ws'.

Return type:

df (pd.DataFrame)

flasc.data_processing.dataframe_manipulations.set_ws_by_all_turbines(df)[source]#

Add ws column by all turbines.

Add a column called 'ws' in your dataframe with value equal to the circular-averaged wind direction measurements of all turbines.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'ws'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_ws_by_upstream_turbines(df, df_upstream, exclude_turbs=[])[source]#

Add wind speed column using upstream turbines.

Add a column called 'ws' in your dataframe with value equal to the averaged wind speed measurements of all the turbines upstream, excluding the turbines listed in exclude_turbs.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). exclude_turbs ([list, array]): array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

  • exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'ws'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_ws_by_upstream_turbines_in_radius(df, df_upstream, turb_no, x_turbs, y_turbs, max_radius, include_itself=True)[source]#

Add wind speed column using in-radius upstream turbines.

Add a column called 'ws' to your dataframe, which is the mean of the columns pow_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.

  • turb_no (int) -- Turbine number from which the radius should be

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.

  • include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.

Returns:

Dataframe which equals the inserted dataframe plus the additional column called 'ws'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_ws_by_n_closest_upstream_turbines(df, df_upstream, turb_no, x_turbs, y_turbs, exclude_turbs=[], N=5)[source]#

Add wind speed column by N closest upstream turbines.

Add a column called 'ws' to your dataframe, which is the mean of the columns ws_%03d for the N closest turbines that are upstream of the turbine of interest [turb_no].

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.DataFrame) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.

  • turb_no (int) -- Turbine number from which the radius should be

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

  • N (int) -- Number of closest turbines to consider for the calculation

Returns:

Dataframe which equals the inserted dataframe plus the additional column called 'pow_ref'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_ti_by_turbines(df, turbine_numbers)[source]#

Add TI column by list of turbines.

Add a column called 'ti' in your dataframe with value equal to the averaged turbulence intensity measurements of all the turbines listed in turbine_numbers.

Parameters:
  • df (pd.DataFrame) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'ti'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_ti_by_all_turbines(df)[source]#

Add TI column using all turbines.

Add a column called 'ti' in your dataframe with value equal to the averaged turbulence intensity measurements of all turbines.

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.

Returns:

Dataframe which equals the inserted dataframe plus the additional column called 'ti'.

Return type:

df (pd.Dataframe)

flasc.data_processing.dataframe_manipulations.set_ti_by_upstream_turbines(df, df_upstream, exclude_turbs=[])[source]#

Add TI column using upstream turbines.

Add a column called 'ti' in your dataframe with value equal to the averaged turbulence intensity measurements of all the turbines upstream, excluding the turbines listed in exclude_turbs.

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). exclude_turbs ([list, array]): array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

  • exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'ti'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_ti_by_upstream_turbines_in_radius(df, df_upstream, turb_no, x_turbs, y_turbs, max_radius, include_itself=True)[source]#

Add TI column by upstream turbines within a radius.

Add a column called 'ti' to your dataframe, which is the mean of the columns ti_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.

  • turb_no (int) -- Turbine number from which the radius should be

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.

  • include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.

Returns:

Dataframe which equals the inserted dataframe plus the additional column called 'ti'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_pow_ref_by_turbines(df, turbine_numbers)[source]#

Add power reference column by list of turbines.

Add a column called 'pow_ref' in your dataframe with value equal to the averaged turbulence intensity measurements of all the turbines listed in turbine_numbers.

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • turbine_numbers ([list, array]) -- List of turbine numbers that should be used to calculate the column average.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'ti'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_pow_ref_by_upstream_turbines(df, df_upstream, exclude_turbs=[])[source]#

Add pow_ref column using upstream turbines.

Add a column called 'pow_ref' in your dataframe with value equal to the averaged power measurements of all the turbines upstream, excluding the turbines listed in exclude_turbs.

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...).

  • exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'pow_ref'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.set_pow_ref_by_upstream_turbines_in_radius(df, df_upstream, turb_no, x_turbs, y_turbs, max_radius, include_itself=False)[source]#

Add pow_ref column using upstream turbines within a radius.

Add a column called 'pow_ref' to your dataframe, which is the mean of the columns pow_%03d for turbines that are upstream and also within radius [max_radius] of the turbine of interest [turb_no].

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...). turb_no (int): Turbine number from which the radius should be calculated.

  • turb_no (int) -- Turbine number from which the radius should be

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • max_radius (float) -- Maximum radius for the upstream turbines until which they are still considered as relevant/used for the calculation of the averaged column quantity.

  • include_itself (bool, optional) -- Include the measurements of turbine turb_no in the determination of the averaged column quantity. Defaults to False.

Returns:

pd.Dataframe Dataframe which equals the inserted dataframe

plus the additional column called 'pow_ref'.

flasc.data_processing.dataframe_manipulations.set_pow_ref_by_n_closest_upstream_turbines(df, df_upstream, turb_no, x_turbs, y_turbs, exclude_turbs=[], N=5)[source]#

Add pow_ref column using N-nearest upstream turbines.

Add a column called 'pow_ref' to your dataframe, which is the mean of the columns pow_%03d for the N closest turbines that are upstream of the turbine of interest [turb_no].

Parameters:
  • df (pd.Dataframe) -- Dataframe with measurements. This dataframe typically consists of wd_%03d, ws_%03d, ti_%03d, pow_%03d, and potentially additional measurements.

  • df_upstream (pd.Dataframe) -- Dataframe containing rows indicating wind direction ranges and the corresponding upstream turbines for that wind direction range. This variable can be generated with flasc.utilities.floris_tools.get_upstream_turbs_floris(...).

  • turb_no (int) -- Turbine number from which the radius should be calculated.

  • x_turbs ([list, array]) -- Array containing x locations of turbines.

  • y_turbs ([list, array]) -- Array containing y locations of turbines.

  • exclude_turbs ([list, array]) -- array-like variable containing turbine indices that should be excluded in determining the column mean quantity.

  • N (int) -- Number of closest turbines to consider for the calculation of the averaged column quantity. Defaults to 5.

Returns:

Dataframe which equals the inserted dataframe

plus the additional column called 'pow_ref'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.df_reduce_precision(df_in, verbose=False, allow_convert_to_integer=True)[source]#

Reduce dataframe precision.

Reduce the precision in dataframes from float64 to float32, or possibly even further to int32, int16, int8 or even bool. This operation typically reduces the size of the dataframe by a factor 2 without any real loss in precision. This can make particular operations and data storage much more efficient. This can also bring about speed-ups doing calculations with these variables.

Parameters:
  • df_in (pd.Dataframe) -- Dataframe that needs to be reduced.

  • verbose (bool, optional) -- Print progress. Defaults to False.

  • allow_convert_to_integer (bool, optional) -- Allow reduction to integer type if possible. Defaults to True.

Returns:

Reduced dataframe

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.df_drop_nan_rows(df, verbose=False)[source]#

Drop all-nan rows.

Remove entries in dataframe where all rows (besides 'time') have nan values.

Parameters:
  • df (pd.Dataframe) -- Input pandas dataframe

  • verbose (bool, optional) -- Print progress. Defaults to False.

Returns:

Dataframe with all-nan rows removed

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.df_find_and_fill_data_gaps_with_missing(df, missing_data_buffer=5.0)[source]#

Find and fill data gap and mark as missing data with NaN.

This function takes a pd.DataFrame object and look for large jumps in

the 'time' column. Rather than simply interpolating these values using a ZOH, this rather indicates that measurements are missing. Hence, this function finds these time gaps and inserts an additional row extra 1 second after the start of the time gap with all 'nan' values. This way, the data gap becomes populated with 'nan' values and the data will be ignored in any further analysis.

Parameters:
  • df (pd.Dataframe) -- Merged dataframe for all imported files

  • missing_data_buffer (int, optional) -- If the time gaps are equal or larger than this limit [s], then it will consider the data as corrupted or missing. Defaults to 10.

Returns:

The postprocessed dataframe where all data

within large time gaps hold value 'missing'.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.df_sort_and_find_duplicates(df)[source]#

This function sorts the dataframe and finds rows with equal time index.

Parameters:

df (pd.Dataframe) -- An (unsorted) dataframe

Returns:

Dataframe sorted by time duplicate_entries_idx ([list of int]): list with indices of the former

of two duplicate rows. The indices correspond to the time-sorted df.

Return type:

pd.Dataframe

flasc.data_processing.dataframe_manipulations.is_day_or_night(df, latitude, longitude, sunrise_altitude=0, sunset_altitude=0, lag_hours=0, datetime_column='time')[source]#

Determine night or day in dataframe.

Determine whether it's day or night for a given set of coordinates and UTC timestamp in a DataFrame.

Parameters:
  • df (pd.DataFrame) -- A Pandas DataFrame containing the time in UTC and other relevant data.

  • latitude (float) -- The latitude of the location for which to determine day or night.

  • longitude (float) -- The longitude of the location for which to determine day or night.

  • sunrise_altitude (float) -- The altitude of the sun to denote that sunrise has occurred [degress]

  • sunset_altitude (float) -- The altitude of the sun to denote that sunset has occurred [degress]

  • lag_hours (float, optional) -- The number of hours to lag behind the timestamp for the daylight

  • 0. (determination. Default is)

  • datetime_column (str, optional) -- The name of the DataFrame column containing the timestamp in UTC. Default is 'time'.

Returns:

The input DataFrame with two additional columns: 'sun_altitude'

(the sun's altitude at the given timestamp) and 'is_day' (a boolean indicating whether it's daytime at the given timestamp).

Return type:

pd.DataFrame

flasc.data_processing.dataframe_manipulations.plot_sun_altitude_with_day_night_color(df, ax=None)[source]#

Plot sun altitude with day-night color differentiation.

This function creates a plot of Sun Altitude over time, distinguishing between day and night periods with different background colors. The input DataFrame 'df' should contain time and sun_altitude columns, as well as a boolean 'is_day' column to indicate day and night periods.

Parameters:
  • df (pd.DataFrame) -- A DataFrame containing time, sun_altitude, and is_day columns.

  • ax (plt.axis, optional) -- An optional Matplotlib axis to use for the plot. If not provided, a new axis will be created.

Returns:

The Matplotlib axis plotted on.

Return type:

ax (plt.axis)

flasc.data_processing.dataframe_manipulations.df_sort_and_fix_duplicates(df)[source]#

Sort dataframe and fill duplicates.

This function sorts the dataframe and addresses duplicate rows (i.e., rows in which the time index is equal). It does this by merging the two rows, replacing the 'nan' entries of one row with the non-'nan' entries of the other row. If someone both rows have different values for the same column, then an exception is thrown.

Parameters:

df (pd.Dataframe) -- An (unsorted) dataframe

Returns:

A time-sorted Dataframe in which its duplicate rows have been merged.

Return type:

df (pd.Dataframe)