You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
KAT used the variance preserving initialization as formulated in the Kaimining initialization for learnable rational activations. This implies calculating the 2nd order moment of a rational function, which has a complicated closed form. We show that this 2nd order moment can be easily computed by considering orthogonal functions. As an example, we used orthogonal polynomials (Hermite) and trigonometric functions (Fourier) and showed that they can be used to achieve better results in image classification on ImageNet using ConvNeXt and next token prediction on OpenWebText using GPT-2.
KAT used the variance preserving initialization as formulated in the Kaimining initialization for learnable rational activations. This implies calculating the 2nd order moment of a rational function, which has a complicated closed form. We show that this 2nd order moment can be easily computed by considering orthogonal functions. As an example, we used orthogonal polynomials (Hermite) and trigonometric functions (Fourier) and showed that they can be used to achieve better results in image classification on ImageNet using ConvNeXt and next token prediction on OpenWebText using GPT-2.
📄 Paper: Learnable Polynomial, Trigonometric, and Tropical Activations
💻 Code: torchortho on GitHub
The text was updated successfully, but these errors were encountered: