Skip to content
jcanny edited this page May 7, 2014 · 7 revisions

Overview

Mixins are additive likelihood functions that allow tailoring of a learning algorithm. The most familiar examples are L1 and L2 regularization. When used with a gradient-based optimizer, L1 and L2 gradients are simply multiples of either the model coefficients or their signs. L1 gradients are discontinuous, which makes it necessary to numerically integrate along constraint surfaces to get an exact optimum. But in a stochastic gradient framework, simply adding the L1 gradient and allow the model coefficients to hunt around the zero value works well enough. This is what BIDMach implements.

Regularizers

As mentioned, regularizers implement a simple additive gradient based on the current model. L1 and L2 regularizers are implemented. Each has a separate regularizer weight, reg1weight and reg2weight.

Clustering/Topic Model Mixins

Clone this wiki locally