Trains a random forest model.
Trains a Random Forest model. A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. This implementation uses ranger R package which provides faster model training.
n_estimatorsthe number of trees in the forest, default= 100
max_featuresthe number of features to consider when looking for the best split.
Possible values are auto(default) takes sqrt(num_of_features),
sqrt same as auto,
log takes log(num_of_features),
none takes all features
max_depththe maximum depth of each tree
min_node_sizethe minumum number of samples required to split an internal node
criterionthe function to measure the quality of split. For classification, gini is used which
is a measure of gini index. For regression, the variance of responses is used.
classificationwhether to train for classification (1) or regression (0)
verboseshow computation status and estimated runtime
seedseed value
class_weightsweights associated with the classes for sampling of training observation
always_splitvector of feature names to be always used for splitting
importanceVariable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression. Defaults to "impurity"
new()RFTrainer$new(
n_estimators,
max_depth,
max_features,
min_node_size,
classification,
class_weights,
always_split,
verbose,
save_model,
seed,
importance
)n_estimatorsinteger, the number of trees in the forest, default= 100
max_depthinteger, the maximum depth of each tree
max_featuresinteger, the number of features to consider when looking for the best split.
Possible values are auto(default) takes sqrt(num_of_features),
sqrt same as auto,
log takes log(num_of_features),
none takes all features
min_node_sizeinteger, the minumum number of samples required to split an internal node
classificationinteger, whether to train for classification (1) or regression (0)
class_weightsweights associated with the classes for sampling of training observation
always_splitvector of feature names to be always used for splitting
verboselogical, show computation status and estimated runtime
save_modellogical, whether to save model
seedinteger, seed value
importanceVariable importance mode, one of 'none', 'impurity', 'impurity_corrected', 'permutation'. The 'impurity' measure is the Gini index for classification, the variance of the responses for regression. Defaults to "impurity"
predict()
## ------------------------------------------------
## Method `RFTrainer$new`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=10,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
## ------------------------------------------------
## Method `RFTrainer$fit`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=10,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
bst$fit(iris, 'Species')
## ------------------------------------------------
## Method `RFTrainer$predict`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=10,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
bst$fit(iris, 'Species')
predictions <- bst$predict(iris)
## ------------------------------------------------
## Method `RFTrainer$get_importance`
## ------------------------------------------------
data("iris")
bst <- RFTrainer$new(n_estimators=50,
max_depth=4,
classification=1,
seed=42,
verbose=TRUE)
bst$fit(iris, 'Species')
predictions <- bst$predict(iris)
bst$get_importance()
#> tmp.order.tmp..decreasing...TRUE..
#> Petal.Width 49.401250
#> Petal.Length 39.462297
#> Sepal.Length 7.850913
#> Sepal.Width 2.474873