The goal of SuperML is to provide sckit-learn’s fit,predict,transform standard way of building machine learning models in R. It is build on top of latest r-packages which provides optimized way of training machine learning models.

Installation

You can install latest stable cran version using (recommended):

install.packages("superml")
install.packages("superml", dependencies=TRUE) # to install all dependencies at once

You can install superml from github with:

# install.packages("devtools")
devtools::install_github("saraswatmks/superml")

Description

In superml, every machine learning algorithm is called as a trainer. Following is the list of trainers available as of today:

  • LMTrainer: used to train linear, logistic, ridge, lasso models
  • KNNTrainer: K-Nearest Neighbour Models
  • KMeansTrainer: KMeans Model
  • NBTrainer: Naive Baiyes Model
  • SVMTrainer: SVM Model
  • RFTrainer: Random Forest Model
  • XGBTrainer: XGBoost Model

In addition, there are other useful functions to support modeling tasks such as:

  • CountVectorizer: Create Bag of Words model
  • TfidfVectorizer: Create TF-IDF feature model
  • LabelEncoder: Convert categorical features to numeric
  • GridSearchCV: For hyperparameter optimization
  • RandomSearchCV: For hyperparameter optimization
  • kFoldMean: Target encoding
  • smoothMean: Target encoding

To compute text similarity, following functions are available:

  • bm_25: Computes bm25 distance
  • dot: Computes dot product between two vectors
  • dotmat: Computes dot product between vector & matrix

Usage

Any machine learning model can be trained using the following steps:

data(iris)
library(superml)

# random forest
rf <- RFTrainer$new(n_estimators = 100)
rf$fit(iris, "Species")
pred <- rf$predict(iris)

Documentation

The documentation can be found here: SuperML Documentation

Contributions & Support

SuperML is my ambitious effort to help people train machine learning models in R as easily as they do in python. I encourage you to use this library, post bugs and feature suggestions in the issues above.