smoothMean Calculator — smoothMean • SuperML

Calculates target encodings using a smoothing parameter and count of categorical variables. This approach is more robust to possibility of leakage and avoid overfitting.

smoothMean(
  train_df,
  test_df,
  colname,
  target,
  min_samples_leaf = 1,
  smoothing = 1,
  noise_level = 0
)

Arguments

train_df: train dataset
test_df: test dataset
colname: name of categorical column
target: name of target column
min_samples_leaf: minimum samples to take category average into account
smoothing: smoothing effect to balance categorical average vs prior
noise_level: random noise to add, optional

Value

a train and test data table with mean encodings of the target for the given categorical variable

Examples

train <- data.frame(region=c('del','csk','rcb','del','csk','pune','guj','del'),
                    win = c(0,1,1,0,0,1,0,1))
test <- data.frame(region=c('rcb','csk','rcb','del','guj','pune','csk','kol'))

# calculate encodings
all_means <- smoothMean(train_df = train,
                         test_df = test,
                         colname = 'region',
                         target = 'win')
train_mean <- all_means$train
test_mean <- all_means$test