-
Notifications
You must be signed in to change notification settings - Fork 168
Mixins
Mixins are additive likelihood functions that allow tailoring of a learning algorithm. The most familiar examples are L1 and L2 regularization. When used with a gradient-based optimizer, L1 and L2 gradients are simply multiples of either the model coefficients or their signs. L1 gradients are discontinuous, which makes it necessary to numerically integrate along constraint surfaces to get an exact optimum. But in a stochastic gradient framework, simply adding the L1 gradient and allow the model coefficients to hunt around the zero value works well enough. This is what BIDMach implements.
As mentioned, regularizers implement a simple additive gradient based on the current model. L1 and L2 regularizers are implemented. Each has a separate regularizer weight, reg1weight and reg2weight.