You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've been using equinox.error_if to throw informative errors in some functions. For example, a common thing we do is to check that all elements in an array are positive.
Unfortunately, error if does not play very well with sharding, as it causes an allgather communication of the error condition.
While this is reasonable (every process must know if we are erroring), as the standard path is to not error, I expect that in 99% of user code this error condition is never met and the collective communication is adding some overhead.
And if we really must error, I do not really care to do it 'elegantly' and error on every process, and would be fine erroring on just one process and 'accepting' that the OS/Scheduler will kill the other processes eventually.
Would it be possible to implement some option to have error_if not produce the collective operation?
The text was updated successfully, but these errors were encountered:
I'd be open to this! I'm not sure how to actually implement that though, I suspect you know better than I do. So usual rules I think, happy to take a PR. :)
Whilst we're here I'll also mention #342, although it's now very out of date.
Hi and thanks for the great library!
We've been using
equinox.error_if
to throw informative errors in some functions. For example, a common thing we do is to check that all elements in an array are positive.Unfortunately, error if does not play very well with sharding, as it causes an allgather communication of the error condition.
While this is reasonable (every process must know if we are erroring), as the standard path is to not error, I expect that in 99% of user code this error condition is never met and the collective communication is adding some overhead.
And if we really must error, I do not really care to do it 'elegantly' and error on every process, and would be fine erroring on just one process and 'accepting' that the OS/Scheduler will kill the other processes eventually.
Would it be possible to implement some option to have
error_if
not produce the collective operation?The text was updated successfully, but these errors were encountered: