-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
comm=ofi completion #12264
Comments
I just sent a question to |
We have an authoritative response on atomic coherence. Basically, see the paragraph on visibility in the An implication of this rule is that if the provider says it can do network AMOs for at least one kind of operation on a given location (non-fetching, fetching, or comparison) but it also cannot do network AMOs for at least one other kind of operation, then the comm layer needs to limit itself to non-network operations for all ops on that location. An easier version of this is that when we're deciding whether to do a network AMO or a processor AMO via AM, we have to choose the latter unless the provider says it can do all operations of concern to us using the network. Otherwise we can end up with one initiator doing one kind of AMO and another initiator doing a different kind of AMO, to the same location, using different techniques (network or processor-via-AM) but not synchronizing, thus breaking the coherence protocol. |
I think one could make the argument that comm=ofi is "complete" in the sense that it passes testing on several different network architectures with different providers. Not everything in the description block has been done, but much of what's not done yet can be viewed as functionally good enough, or else a performance improvement. So, I'm removing myself as the assignee but leaving this open for the group to make the decision. |
The ofi comm layer "implements" the whole comm interface in the sense that all the functions are present and any Chapel program will link. But a number of the functions are implemented such that although they work functionally they don't behave exactly as implied or expected. For example,
chpl_comm_get_nb()
actually does a blocking GET, not a nonblocking one. Here we list what still needs doing.fi_getinfo(3)
; what it should do instead is to specify in the hints the capabilities it needs/wants, and take the first (and by definition best) provider in the returned list, though probably with an env var override (https://github.com/Cray/chapel-private/issues/132)verbs
providerexecuteOn
message size is hard-limited to 10kb; need to add 'large' support for bigger onesfi_getinfo(3)
, we should be settingFI_LOCAL_COMM
andFI_REMOTE_COMM
in our hintsfi_getinfo(3)
, for best performance we should be setting mode flags we can support in our hints, and respecting mode flags that are passed backmpirun4ofi
launcher needs some work:mpirun
; it should be able to use MPICHmpirun
as wellThe text was updated successfully, but these errors were encountered: