You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For begging more or less we need only a single optimization - Kernel fusion. The optimization basically states that whenever we have a composition of nodes, but the first one will never be used again we can "fuse" the computation avoiding extra looping. This would require a special FusionOp which would contain a graph initself.
Example: f = tanh(a + b). This normally would become: n0 = a + b and n0 = tanh(n0) (assuming the memory optimizer works well). However, on a GPU these are still 2 kernels. on a CPU two loops. Fusing this would mean we move from:
for (ni, ai, bi) in Zip::new((&mut n0, &a, &b)) {
*ni = ai + bi;
}
for &mut ni in &mut n0 {
*ni = tanh(*ni);
}
To
for (ni, ai, bi) in Zip::new((&mut n0, &a, &b)) {
*ni = tanh(ai + bi);
}
The text was updated successfully, but these errors were encountered:
For begging more or less we need only a single optimization - Kernel fusion. The optimization basically states that whenever we have a composition of nodes, but the first one will never be used again we can "fuse" the computation avoiding extra looping. This would require a special
FusionOp
which would contain a graph initself.Example:
f = tanh(a + b)
. This normally would become:n0 = a + b
andn0 = tanh(n0)
(assuming the memory optimizer works well). However, on a GPU these are still 2 kernels. on a CPU two loops. Fusing this would mean we move from:To
The text was updated successfully, but these errors were encountered: