Skip to content
jcanny edited this page May 7, 2014 · 7 revisions

Overview

Mixins are additive likelihood functions that allow tailoring of a learning algorithm. The most familiar examples are L1 and L2 regularization. When used with a gradient-based optimizer, L1 and L2 gradients are simply multiples of either the model coefficients or their signs. L1 gradients are discontinuous, which makes it necessary to numerically integrate along constraint surfaces to get an exact optimum. But in a stochastic gradient framework, simply adding the L1 gradient and allow the model coefficients to hunt around the zero value works quite well. This is what BIDMach implements now. Mixins must implement two methods:

def compute(mats:Array[Mat], step:Float)
def score(mats:Array[Mat], step:Float):FMat

the first computes and applies a gradient matrix which is added to the update (total gradient) for each model matrix. The second computes the score, or likelihood for that mixin. The score is broken down into k parts if there are k matrices being updated.

Regularizers

As mentioned, regularizers implement a simple additive gradient based on the current model. L1 and L2 regularizers are implemented. Each has a separate regularizer weight: reg1weight and reg2weight.

Clustering/Topic Model Mixins

There are three mixins designed to

Clone this wiki locally