Trains a random forest model.
Trains a Random Forest model. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. This implementation uses ranger R package which provides faster model training.
n_estimators
the number of trees in the forest, default= 100
max_features
the number of features to consider when looking for the best split.
Possible values are auto(default)
takes sqrt(num_of_features),
sqrt
same as auto,
log
takes log(num_of_features),
none
takes all features
max_depth
the maximum depth of each tree
min_node_size
the minumum number of samples required to split an internal node
criterion
the function to measure the quality of split. For classification, gini
is used which
is a measure of gini index. For regression, the variance
of responses is used.
classification
whether to train for classification (1) or regression (0)
verbose
show computation status and estimated runtime
seed
seed value
class_weights
weights associated with the classes for sampling of training observation
always_split
vector of feature names to be always used for splitting
importance
Variable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression. Defaults to "impurity"
new()
RFTrainer$new(
n_estimators,
max_depth,
max_features,
min_node_size,
classification,
class_weights,
always_split,
verbose,
save_model,
seed,
importance
)
n_estimators
integer, the number of trees in the forest, default= 100
max_depth
integer, the maximum depth of each tree
max_features
integer, the number of features to consider when looking for the best split.
Possible values are auto(default)
takes sqrt(num_of_features),
sqrt
same as auto,
log
takes log(num_of_features),
none
takes all features
min_node_size
integer, the minumum number of samples required to split an internal node
classification
integer, whether to train for classification (1) or regression (0)
class_weights
weights associated with the classes for sampling of training observation
always_split
vector of feature names to be always used for splitting
verbose
logical, show computation status and estimated runtime
save_model
logical, whether to save model
seed
integer, seed value
importance
Variable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression. Defaults to "impurity"
predict()
## ------------------------------------------------
## Method `RFTrainer$new`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=10,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
## ------------------------------------------------
## Method `RFTrainer$fit`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=10,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
bst$fit(iris, 'Species')
## ------------------------------------------------
## Method `RFTrainer$predict`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=10,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
bst$fit(iris, 'Species')
predictions <- bst$predict(iris)
## ------------------------------------------------
## Method `RFTrainer$get_importance`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=50,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
bst$fit(iris, 'Species')
predictions <- bst$predict(iris)
bst$get_importance()
#> tmp.order.tmp..decreasing...TRUE..
#> Petal.Width 49.401250
#> Petal.Length 39.462297
#> Sepal.Length 7.850913
#> Sepal.Width 2.474873