ROM Generators¶

Model Generator Base¶

class rom.generators.model_generator_base.ModelGeneratorBase(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: object

anova_plots(y_data, yhat, model_name)[source]¶

build(data_file, metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_moniker, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶

Generic base function to evaluate the performance of the models.

Parameters:	model – model_name – x_data – y_data – downsample – build_time –
Returns:	Ordered dict

save_dataframe(dataframe, path)[source]¶

train_test_validate_split(dataset, metamodel, downsample=None, scale=False)[source]¶

Use the built in method to generate the train and test data. This adds an additional set of data for validation. This vaildation dataset is a unique ID that is pulled out of the dataset before the test_train method is called.

# :param dataset: dataframe, data to process # :param covariates: list, dict of covariates and information # :param responses: list, of responses to keep in the dataset # :param validation_id: str, unique ID of model to extract :param kwargs: downsample - fraction of dataframe to keep (after validation data extraction) :return: dataframes, dataframe: 1) dataset with removed validation data, 2) validation data

yy_plots(y_data, yhat, model_name)[source]¶

Plot the yy-plots

Parameters:	y_data – yhat – model_name –
Returns:

Linear Model¶

class rom.generators.linear_model.LinearModel(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(data_file, metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_type, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶: Evaluate the performance of the forest based on known x_data and y_data. If the model was scaled, then the test data will already be scaled.

Random Forest Model¶

class rom.generators.random_forest.RandomForest(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(data_file, metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_type, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶

Evaluate the performance of the forest based on known x_data and y_data.

Parameters:	model – model_name – model_type – x_data – y_data – downsample – build_time – cv_time – covariates –
Returns:

export_tree_png(tree, covariates, filename)[source]¶

save_cv_results(cv_results, response, downsample, filename)[source]¶

Save the cv_results to a CSV file. Data in the cv_results file looks like the following.

The CV results are the results of the GridSearch k-fold cross validation. The form of the results take the following from:

{
    'param_kernel': masked_array(data=['poly', 'poly', 'rbf', 'rbf'],
                                 mask=[False False False False]...)
    'param_gamma': masked_array(data=[-- -- 0.1 0.2],
                                mask=[True  True False False]...),
    'param_degree': masked_array(data=[2.0 3.0 - - --],
                                 mask=[False False  True  True]...),
    'split0_test_score': [0.8, 0.7, 0.8, 0.9],
    'split1_test_score': [0.82, 0.5, 0.7, 0.78],
    'mean_test_score': [0.81, 0.60, 0.75, 0.82],
    'std_test_score': [0.02, 0.01, 0.03, 0.03],
    'rank_test_score': [2, 4, 3, 1],
    'split0_train_score': [0.8, 0.9, 0.7],
    'split1_train_score': [0.82, 0.5, 0.7],
    'mean_train_score': [0.81, 0.7, 0.7],
    'std_train_score': [0.03, 0.03, 0.04],
    'mean_fit_time': [0.73, 0.63, 0.43, 0.49],
    'std_fit_time': [0.01, 0.02, 0.01, 0.01],
    'mean_score_time': [0.007, 0.06, 0.04, 0.04],
    'std_score_time': [0.001, 0.002, 0.003, 0.005],
    'params': [{'kernel': 'poly', 'degree': 2}, ...],
}

Parameters:	cv_results – filename –
Returns:

Support Vector Regression¶

class rom.generators.svr.SVR(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(data_file, metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_moniker, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶: Evaluate the performance of the forest based on known x_data and y_data.

save_cv_results(cv_results, response, downsample, filename)[source]¶

Save the cv_results to a CSV file. Data in the cv_results file looks like the following.

{

‘param_kernel’: masked_array(data=[‘poly’, ‘poly’, ‘rbf’, ‘rbf’],: mask=[False False False False]…)
‘param_gamma’: masked_array(data=[– – 0.1 0.2],: mask=[True True False False]…),
‘param_degree’: masked_array(data=[2.0 3.0 - - –],: mask=[False False True True]…),

‘split0_test_score’: [0.8, 0.7, 0.8, 0.9], ‘split1_test_score’: [0.82, 0.5, 0.7, 0.78], ‘mean_test_score’: [0.81, 0.60, 0.75, 0.82], ‘std_test_score’: [0.02, 0.01, 0.03, 0.03], ‘rank_test_score’: [2, 4, 3, 1], ‘split0_train_score’: [0.8, 0.9, 0.7], ‘split1_train_score’: [0.82, 0.5, 0.7], ‘mean_train_score’: [0.81, 0.7, 0.7], ‘std_train_score’: [0.03, 0.03, 0.04], ‘mean_fit_time’: [0.73, 0.63, 0.43, 0.49], ‘std_fit_time’: [0.01, 0.02, 0.01, 0.01], ‘mean_score_time’: [0.007, 0.06, 0.04, 0.04], ‘std_score_time’: [0.001, 0.002, 0.003, 0.005], ‘params’: [{‘kernel’: ‘poly’, ‘degree’: 2}, …],

}

Parameters:	cv_results – filename –
Returns: