phygnn.model_interfaces.random_forest_model.RandomForestModel

class RandomForestModel(model, feature_names=None, label_name=None, norm_params=None, normalize=True, one_hot_categories=None)[source]

Bases: ModelBase

scikit learn Random Forest Regression model interface

Parameters:
  • model (sklearn.ensemble.RandomForestRegressor) – Sklearn Random Forest Model

  • feature_names (list) – Ordered list of feature names.

  • label_name (str) – label (output) variable name.

  • norm_params (dict, optional) – Dictionary mapping feature and label names (keys) to normalization parameters (mean, stdev), by default None

  • normalize (bool | tuple, optional) – Boolean flag(s) as to whether features and labels should be normalized. Possible values: - True means normalize both - False means don’t normalize either - Tuple of flags (normalize_feature, normalize_label) by default True

  • one_hot_categories (dict, optional) – Features to one-hot encode using given categories, if None do not run one-hot encoding, by default None

Methods

build_trained(features, label[, normalize, ...])

Build Random Forest Model with given kwargs and then train with given features, labels, and kwargs

compile_model(**kwargs)

Build sklearn random forest model

dict_json_convert(inp)

Recursively convert numeric values in dict to work with json dump

get_mean(name)

Get feature | label mean

get_norm_params(names)

Get means and stdevs for given feature/label names

get_stdev(name)

Get feature | label stdev

load(path)

Load model from model path.

make_one_hot_feature_names(feature_names, ...)

Update feature_names after one-hot encoding

normalize(data[, names])

Normalize given data

parse_features(features[, names])

Parse features - preprocessing of feature data before training or prediction.

parse_labels(label[, name])

Parse labels and normalize if desired

predict(features[, table, parse_kwargs, ...])

Use model to predict label from given features

save_model(path)

Save Random Forest Model to path.

seed([s])

Set the random seed for reproducible results.

train_model(features, label[, shuffle, ...])

Train the model with the provided features and label

unnormalize(data[, names])

Un-normalize given data

unnormalize_prediction(prediction)

Unnormalize prediction if needed

Attributes

feature_dims

Number of features

feature_means

Feature means, used for (un)normalization

feature_names

List of the feature variable names.

feature_stdevs

Feature stdevs, used for (un)normalization

input_feature_names

Input feature names

label_dims

Number of labels

label_means

label means, used for (un)normalization

label_names

label variable names

label_stdevs

label stdevs, used for (un)normalization

means

Mapping feature/label names to the mean values for (un)normalization

model

Trained model

model_summary

Tensorflow model summary

normalization_parameters

Features and label (un)normalization parameters

normalize_features

Flag to normalize features

normalize_labels

Flag to normalize labels

one_hot_categories

categories to use for one-hot encoding

one_hot_feature_names

One-hot encoded feature names

one_hot_input_feature_names

Input feature names to be one-hot encoded

stdevs

Mapping feature/label names to the stdev values for (un)normalization

version_record

A record of important versions that this model was built with.

static compile_model(**kwargs)[source]

Build sklearn random forest model

Parameters:

kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor

Returns:

sklearn.ensemble.RandomForestRegressor – sklearn random forest model

unnormalize_prediction(prediction)[source]

Unnormalize prediction if needed

Parameters:

prediction (ndarray) – Model prediction

Returns:

prediction (ndarray) – Native prediction

parse_labels(label, name=None)[source]

Parse labels and normalize if desired

Parameters:
  • label (pandas.DataFrame | dict | ndarray) – Features to train on or predict from

  • name (list, optional) – List of label names, by default None

Returns:

label (ndarray) – Parsed labels array, normalized if desired

train_model(features, label, shuffle=True, parse_kwargs=None, fit_kwargs=None)[source]

Train the model with the provided features and label

Parameters:
  • features (dict | pandas.DataFrame) – Input features to train on

  • label (dict | pandas.DataFrame) – label to train on

  • shuffle (bool) – Flag to randomly subset the validation data and batch selection from features and labels.

  • parse_kwargs (dict) – kwargs for cls.parse_features

  • fit_kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor.fit

save_model(path)[source]

Save Random Forest Model to path.

Parameters:

path (str) – Path to save model to

classmethod build_trained(features, label, normalize=True, one_hot_categories=None, shuffle=True, save_path=None, compile_kwargs=None, parse_kwargs=None, fit_kwargs=None)[source]

Build Random Forest Model with given kwargs and then train with given features, labels, and kwargs

Parameters:
  • features (pandas.DataFrame) – Model features

  • label (pandas.DataFrame) – label to train on

  • normalize (bool | tuple, optional) – Boolean flag(s) as to whether features and labels should be normalized. Possible values: - True means normalize both - False means don’t normalize either - Tuple of flags (normalize_feature, normalize_label) by default True

  • one_hot_categories (dict, optional) – Features to one-hot encode using given categories, if None do not run one-hot encoding, by default None

  • shuffle (bool) – Flag to randomly subset the validation data and batch selection from features and labels.

  • save_path (str) – Directory path to save model to. The RandomForest Model will be saved to the directory while the framework parameters will be saved in json.

  • compile_kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor

  • parse_kwargs (dict) – kwargs for cls.parse_features

  • fit_kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor.fit

Returns:

model (RandomForestModel) – Initialized and trained RandomForestModel obj

classmethod load(path)[source]

Load model from model path.

Parameters:

path (str) – Directory path to RandomForestModel from pickle file.

Returns:

model (RandomForestModel) – Loaded RandomForestModel from disk.

static dict_json_convert(inp)

Recursively convert numeric values in dict to work with json dump

Parameters:

inp (dict) – Dictionary to convert.

Returns:

out (dict) – Copy of dict input with all nested numeric values converted to base python int or float and all arrays converted to lists.

property feature_dims

Number of features

Returns:

int

property feature_means

Feature means, used for (un)normalization

Returns:

list

property feature_names

List of the feature variable names.

Returns:

list

property feature_stdevs

Feature stdevs, used for (un)normalization

Returns:

list

get_mean(name)

Get feature | label mean

Parameters:

name (str) – feature | label name

Returns:

mean (float) – Mean value used for normalization

get_norm_params(names)

Get means and stdevs for given feature/label names

Parameters:

names (list) – list of feature/label names to get normalization params for

Returns:

  • means (list) – List of means to use for (un)normalization

  • stdevs (list) – List of stdevs to use for (un)normalization

get_stdev(name)

Get feature | label stdev

Parameters:

name (str) – feature | label name

Returns:

stdev (float) – Stdev value used for normalization

property input_feature_names

Input feature names

Returns:

list

property label_dims

Number of labels

Returns:

int

property label_means

label means, used for (un)normalization

Returns:

list

property label_names

label variable names

Returns:

list

property label_stdevs

label stdevs, used for (un)normalization

Returns:

list

static make_one_hot_feature_names(feature_names, one_hot_categories)

Update feature_names after one-hot encoding

Parameters:
  • feature_names (list) – Input feature names

  • one_hot_categories (dict) – Features to one-hot encode using given categories

Returns:

one_hot_feature_names (list) – Updated list of feature names with one_hot categories

property means

Mapping feature/label names to the mean values for (un)normalization

Returns:

dict

property model

Trained model

Returns:

tensorflow.keras.models

property model_summary

Tensorflow model summary

Returns:

str

property normalization_parameters

Features and label (un)normalization parameters

Returns:

dict

normalize(data, names=None)

Normalize given data

Parameters:
  • data (dict | pandas.DataFrame | ndarray) – Data to normalize

  • names (list, optional) – List of data item names, needed to normalized ndarrays, by default None

Returns:

data (dict | pandas.DataFrame | ndarray) – Normalized data in same format as input

property normalize_features

Flag to normalize features

Returns:

bool

property normalize_labels

Flag to normalize labels

Returns:

bool

property one_hot_categories

categories to use for one-hot encoding

Returns:

dict

property one_hot_feature_names

One-hot encoded feature names

Returns:

list

property one_hot_input_feature_names

Input feature names to be one-hot encoded

Returns:

list

parse_features(features, names=None, **kwargs)

Parse features - preprocessing of feature data before training or prediction. This will do one-hot encoding based on self.one_hot_categories, and feature normalization based on self.normalize_features

Parameters:
  • features (pandas.DataFrame | dict | ndarray) – Features to train on or predict from

  • names (list, optional) – List of feature names, by default None

  • kwargs (dict, optional) – kwargs for PreProcess.one_hot

Returns:

features (ndarray) – Parsed features array normalized and with str columns converted to one hot vectors if desired

predict(features, table=True, parse_kwargs=None, predict_kwargs=None)

Use model to predict label from given features

Parameters:
  • features (dict | pandas.DataFrame) – features to predict from

  • table (bool, optional) – Return pandas DataFrame

  • parse_kwargs (dict) – kwargs for cls.parse_features

  • predict_kwargs (dict) – kwargs for tensorflow.*.predict

Returns:

prediction (ndarray | pandas.DataFrame) – label prediction

static seed(s=0)

Set the random seed for reproducible results. :Parameters: s (int) – Random number generator seed

property stdevs

Mapping feature/label names to the stdev values for (un)normalization

Returns:

dict

unnormalize(data, names=None)

Un-normalize given data

Parameters:
  • data (dict | pandas.DataFrame | ndarray) – Data to un-normalize

  • names (list, optional) – List of data item names, needed to un-normalized ndarrays, by default None

Returns:

data (dict | pandas.DataFrame | ndarray) – Native data in same format as input

property version_record

A record of important versions that this model was built with.

Returns:

dict