phygnn.model_interfaces.random_forest_model.RandomForestModel

class RandomForestModel(model, feature_names=None, label_name=None, norm_params=None, normalize=True, one_hot_categories=None)[source]

Bases: ModelBase

scikit learn Random Forest Regression model interface

Parameters:

model (sklearn.ensemble.RandomForestRegressor) – Sklearn Random Forest Model
feature_names (list) – Ordered list of feature names.
label_name (str) – label (output) variable name.
norm_params (dict, optional) – Dictionary mapping feature and label names (keys) to normalization parameters (mean, stdev), by default None
normalize (bool | tuple, optional) – Boolean flag(s) as to whether features and labels should be normalized. Possible values: - True means normalize both - False means don’t normalize either - Tuple of flags (normalize_feature, normalize_label) by default True
one_hot_categories (dict, optional) – Features to one-hot encode using given categories, if None do not run one-hot encoding, by default None

Methods

`build_trained`(features, label[, normalize, ...])	Build Random Forest Model with given kwargs and then train with given features, labels, and kwargs
`compile_model`(**kwargs)	Build sklearn random forest model
`dict_json_convert`(inp)	Recursively convert numeric values in dict to work with json dump
`get_mean`(name)	Get feature \| label mean
`get_norm_params`(names)	Get means and stdevs for given feature/label names
`get_stdev`(name)	Get feature \| label stdev
`load`(path)	Load model from model path.
`make_one_hot_feature_names`(feature_names, ...)	Update feature_names after one-hot encoding
`normalize`(data[, names])	Normalize given data
`parse_features`(features[, names])	Parse features - preprocessing of feature data before training or prediction.
`parse_labels`(label[, name])	Parse labels and normalize if desired
`predict`(features[, table, parse_kwargs, ...])	Use model to predict label from given features
`save_model`(path)	Save Random Forest Model to path.
`seed`([s])	Set the random seed for reproducible results.
`train_model`(features, label[, shuffle, ...])	Train the model with the provided features and label
`unnormalize`(data[, names])	Un-normalize given data
`unnormalize_prediction`(prediction)	Unnormalize prediction if needed

Attributes

`feature_dims`	Number of features
`feature_means`	Feature means, used for (un)normalization
`feature_names`	List of the feature variable names.
`feature_stdevs`	Feature stdevs, used for (un)normalization
`input_feature_names`	Input feature names
`label_dims`	Number of labels
`label_means`	label means, used for (un)normalization
`label_names`	label variable names
`label_stdevs`	label stdevs, used for (un)normalization
`means`	Mapping feature/label names to the mean values for (un)normalization
`model`	Trained model
`model_summary`	Tensorflow model summary
`normalization_parameters`	Features and label (un)normalization parameters
`normalize_features`	Flag to normalize features
`normalize_labels`	Flag to normalize labels
`one_hot_categories`	categories to use for one-hot encoding
`one_hot_feature_names`	One-hot encoded feature names
`one_hot_input_feature_names`	Input feature names to be one-hot encoded
`stdevs`	Mapping feature/label names to the stdev values for (un)normalization
`version_record`	A record of important versions that this model was built with.

static compile_model(**kwargs)[source]

Build sklearn random forest model

Parameters:: kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor
Returns:: sklearn.ensemble.RandomForestRegressor – sklearn random forest model

unnormalize_prediction(prediction)[source]

Unnormalize prediction if needed

Parameters:: prediction (ndarray) – Model prediction
Returns:: prediction (ndarray) – Native prediction

parse_labels(label, name=None)[source]

Parse labels and normalize if desired

Parameters:

label (pandas.DataFrame | dict | ndarray) – Features to train on or predict from
name (list, optional) – List of label names, by default None

Returns:

label (ndarray) – Parsed labels array, normalized if desired

train_model(features, label, shuffle=True, parse_kwargs=None, fit_kwargs=None)[source]

Train the model with the provided features and label

Parameters:

features (dict | pandas.DataFrame) – Input features to train on
label (dict | pandas.DataFrame) – label to train on
shuffle (bool) – Flag to randomly subset the validation data and batch selection from features and labels.
parse_kwargs (dict) – kwargs for cls.parse_features
fit_kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor.fit

save_model(path)[source]

Save Random Forest Model to path.

Parameters:: path (str) – Path to save model to

classmethod build_trained(features, label, normalize=True, one_hot_categories=None, shuffle=True, save_path=None, compile_kwargs=None, parse_kwargs=None, fit_kwargs=None)[source]

Build Random Forest Model with given kwargs and then train with given features, labels, and kwargs

Parameters:

features (pandas.DataFrame) – Model features
label (pandas.DataFrame) – label to train on
normalize (bool | tuple, optional) – Boolean flag(s) as to whether features and labels should be normalized. Possible values: - True means normalize both - False means don’t normalize either - Tuple of flags (normalize_feature, normalize_label) by default True
one_hot_categories (dict, optional) – Features to one-hot encode using given categories, if None do not run one-hot encoding, by default None
shuffle (bool) – Flag to randomly subset the validation data and batch selection from features and labels.
save_path (str) – Directory path to save model to. The RandomForest Model will be saved to the directory while the framework parameters will be saved in json.
compile_kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor
parse_kwargs (dict) – kwargs for cls.parse_features
fit_kwargs (dict) – kwargs for sklearn.ensemble.RandomForestRegressor.fit

Returns:

model (RandomForestModel) – Initialized and trained RandomForestModel obj

classmethod load(path)[source]

Load model from model path.

Parameters:: path (str) – Directory path to RandomForestModel from pickle file.
Returns:: model (RandomForestModel) – Loaded RandomForestModel from disk.

static dict_json_convert(inp)

Recursively convert numeric values in dict to work with json dump

Parameters:: inp (dict) – Dictionary to convert.
Returns:: out (dict) – Copy of dict input with all nested numeric values converted to base python int or float and all arrays converted to lists.

property feature_dims

Number of features

Returns:: int

property feature_means

Feature means, used for (un)normalization

Returns:: list

property feature_names

List of the feature variable names.

Returns:: list

property feature_stdevs

Feature stdevs, used for (un)normalization

Returns:: list

get_mean(name)

Get feature | label mean

Parameters:: name (str) – feature | label name
Returns:: mean (float) – Mean value used for normalization

get_norm_params(names)

Get means and stdevs for given feature/label names

Parameters:

names (list) – list of feature/label names to get normalization params for

Returns:

means (list) – List of means to use for (un)normalization
stdevs (list) – List of stdevs to use for (un)normalization

get_stdev(name)

Get feature | label stdev

Parameters:: name (str) – feature | label name
Returns:: stdev (float) – Stdev value used for normalization

property input_feature_names

Input feature names

Returns:: list

property label_dims

Number of labels

Returns:: int

property label_means

label means, used for (un)normalization

Returns:: list

property label_names

label variable names

Returns:: list

property label_stdevs

label stdevs, used for (un)normalization

Returns:: list

static make_one_hot_feature_names(feature_names, one_hot_categories)

Update feature_names after one-hot encoding

Parameters:

feature_names (list) – Input feature names
one_hot_categories (dict) – Features to one-hot encode using given categories

Returns:

one_hot_feature_names (list) – Updated list of feature names with one_hot categories

property means

Mapping feature/label names to the mean values for (un)normalization

Returns:: dict

property model

Trained model

Returns:: tensorflow.keras.models

property model_summary

Tensorflow model summary

Returns:: str

property normalization_parameters

Features and label (un)normalization parameters

Returns:: dict

normalize(data, names=None)

Normalize given data

Parameters:

data (dict | pandas.DataFrame | ndarray) – Data to normalize
names (list, optional) – List of data item names, needed to normalized ndarrays, by default None

Returns:

data (dict | pandas.DataFrame | ndarray) – Normalized data in same format as input

property normalize_features

Flag to normalize features

Returns:: bool

property normalize_labels

Flag to normalize labels

Returns:: bool

property one_hot_categories

categories to use for one-hot encoding

Returns:: dict

property one_hot_feature_names

One-hot encoded feature names

Returns:: list

property one_hot_input_feature_names

Input feature names to be one-hot encoded

Returns:: list

parse_features(features, names=None, **kwargs)

Parse features - preprocessing of feature data before training or prediction. This will do one-hot encoding based on self.one_hot_categories, and feature normalization based on self.normalize_features

Parameters:

features (pandas.DataFrame | dict | ndarray) – Features to train on or predict from
names (list, optional) – List of feature names, by default None
kwargs (dict, optional) – kwargs for PreProcess.one_hot

Returns:

features (ndarray) – Parsed features array normalized and with str columns converted to one hot vectors if desired

predict(features, table=True, parse_kwargs=None, predict_kwargs=None)

Use model to predict label from given features

Parameters:

features (dict | pandas.DataFrame) – features to predict from
table (bool, optional) – Return pandas DataFrame
parse_kwargs (dict) – kwargs for cls.parse_features
predict_kwargs (dict) – kwargs for tensorflow.*.predict

Returns:

prediction (ndarray | pandas.DataFrame) – label prediction

static seed(s=0)

Set the random seed for reproducible results.

Parameters:: s (int) – Random number generator seed

property stdevs

Mapping feature/label names to the stdev values for (un)normalization

Returns:: dict

unnormalize(data, names=None)

Un-normalize given data

Parameters:

data (dict | pandas.DataFrame | ndarray) – Data to un-normalize
names (list, optional) – List of data item names, needed to un-normalized ndarrays, by default None

Returns:

data (dict | pandas.DataFrame | ndarray) – Native data in same format as input

property version_record

A record of important versions that this model was built with.

Returns:: dict