The expert-selecting Gating strategy of MOE. #194
Leopold2333
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi. I'm reading the code snippets of Moirai-MOE and got some problems.
I see that each MOEFeedforward layer would use
centroid
to calculate the Euclidean distance to each "cluster".uni2ts/src/uni2ts/module/ffn.py
Line 138 in 10c1b4c
However, I'm confused how these clusters (or,
centroid
I guess) are generated. In the moduleTransformerEncoder
, thecentroid
is created using Pytorch register_buffer function. So thecentroid
here would be a zero tensor all the time? If so, why does it work as the "clusters"? The snippets here seems different from that described in the paper.uni2ts/src/uni2ts/module/transformer.py
Lines 182 to 184 in 10c1b4c
Beta Was this translation helpful? Give feedback.
All reactions