-
Notifications
You must be signed in to change notification settings - Fork 168
Mixins
Mixins are additive likelihood functions that allow tailoring of a learning algorithm. The most familiar examples are L1 and L2 regularization. When used with a gradient-based optimizer, L1 and L2 gradients are simply multiples of either the model coefficients or their signs. L1 gradients are discontinuous, which makes it necessary to numerically integrate along constraint surfaces to get an exact optimum. But in a stochastic gradient framework, simply adding the L1 gradient and allow the model coefficients to hunt around the zero value works quite well. This is what BIDMach implements now. Mixins must implement two methods:
def compute(mats:Array[Mat], step:Float) def score(mats:Array[Mat], step:Float):FMat
the first computes and applies a gradient matrix which is added to the update (total gradient) for each model matrix. The second computes the score, or likelihood for that mixin. The score is broken down into k parts if there are k matrices being updated.
As mentioned, regularizers implement a simple additive gradient based on the current model. L1 and L2 regularizers are implemented. Each has a separate regularizer weight: reg1weight and reg2weight.
There are three mixins designed to