Runs grid search cross validation scheme to find best model training parameters.

Details

Grid search CV is used to train a machine learning model with multiple combinations of training hyper parameters and finds the best combination of parameters which optimizes the evaluation metric. It creates an exhaustive set of hyperparameter combinations and train model on each combination.

Public fields

trainer

superml trainer object, could be either XGBTrainer, RFTrainer, NBTrainer etc.

parameters

a list of parameters to tune

n_folds

number of folds to use to split the train data

scoring

scoring metric used to evaluate the best model, multiple values can be provided. currently supports: auc, accuracy, mse, rmse, logloss, mae, f1, precision, recall

evaluation_scores

parameter for internal use

Methods


Method new()

Usage

GridSearchCV$new(trainer = NA, parameters = NA, n_folds = NA, scoring = NA)

Arguments

trainer

superml trainer object, could be either XGBTrainer, RFTrainer, NBTrainer etc.

parameters

list, a list of parameters to tune

n_folds

integer, number of folds to use to split the train data

scoring

character, scoring metric used to evaluate the best model, multiple values can be provided. currently supports: auc, accuracy, mse, rmse, logloss, mae, f1, precision, recall

Details

Create a new `GridSearchCV` object.

Returns

A `GridSearchCV` object.

Examples

rf <- RFTrainer$new()
gst <-GridSearchCV$new(trainer = rf,
                      parameters = list(n_estimators = c(100),
                                        max_depth = c(5,2,10)),
                                        n_folds = 3,
                                        scoring = c('accuracy','auc'))


Method fit()

Usage

GridSearchCV$fit(X, y)

Arguments

X

data.frame or data.table

y

character, name of target variable

Details

Trains the model using grid search

Returns

NULL

Examples

rf <- RFTrainer$new()
gst <-GridSearchCV$new(trainer = rf,
                      parameters = list(n_estimators = c(100),
                                        max_depth = c(5,2,10)),
                                        n_folds = 3,
                                        scoring = c('accuracy','auc'))
data("iris")
gst$fit(iris, "Species")


Method best_iteration()

Usage

GridSearchCV$best_iteration(metric = NULL)

Arguments

metric

character, which metric to use for evaluation

Details

Returns the best parameters

Returns

a list of best parameters

Examples

rf <- RFTrainer$new()
gst <-GridSearchCV$new(trainer = rf,
                      parameters = list(n_estimators = c(100),
                                        max_depth = c(5,2,10)),
                                        n_folds = 3,
                                        scoring = c('accuracy','auc'))
data("iris")
gst$fit(iris, "Species")
gst$best_iteration()


Method clone()

The objects of this class are cloneable with this method.

Usage

GridSearchCV$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `GridSearchCV$new`
## ------------------------------------------------

rf <- RFTrainer$new()
gst <-GridSearchCV$new(trainer = rf,
                      parameters = list(n_estimators = c(100),
                                        max_depth = c(5,2,10)),
                                        n_folds = 3,
                                        scoring = c('accuracy','auc'))

## ------------------------------------------------
## Method `GridSearchCV$fit`
## ------------------------------------------------

rf <- RFTrainer$new()
gst <-GridSearchCV$new(trainer = rf,
                      parameters = list(n_estimators = c(100),
                                        max_depth = c(5,2,10)),
                                        n_folds = 3,
                                        scoring = c('accuracy','auc'))
data("iris")
gst$fit(iris, "Species")
#> [1] "entering grid search"
#> [1] "In total, 3 models will be trained"

## ------------------------------------------------
## Method `GridSearchCV$best_iteration`
## ------------------------------------------------

rf <- RFTrainer$new()
gst <-GridSearchCV$new(trainer = rf,
                      parameters = list(n_estimators = c(100),
                                        max_depth = c(5,2,10)),
                                        n_folds = 3,
                                        scoring = c('accuracy','auc'))
data("iris")
gst$fit(iris, "Species")
#> [1] "entering grid search"
#> [1] "In total, 3 models will be trained"
gst$best_iteration()
#> $n_estimators
#> [1] 100
#> 
#> $max_depth
#> [1] 5
#> 
#> $accuracy_avg
#> [1] 0.9533333
#> 
#> $accuracy_sd
#> [1] 0.0305505
#> 
#> $auc_avg
#> [1] 0.5319473
#> 
#> $auc_sd
#> [1] 0.05977745
#>