Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comm=ofi completion #12264

Open
8 of 18 tasks
gbtitus opened this issue Feb 6, 2019 · 3 comments
Open
8 of 18 tasks

comm=ofi completion #12264

gbtitus opened this issue Feb 6, 2019 · 3 comments

Comments

@gbtitus
Copy link
Member

gbtitus commented Feb 6, 2019

The ofi comm layer "implements" the whole comm interface in the sense that all the functions are present and any Chapel program will link. But a number of the functions are implemented such that although they work functionally they don't behave exactly as implied or expected. For example, chpl_comm_get_nb() actually does a blocking GET, not a nonblocking one. Here we list what still needs doing.

  • comm=ofi currently specifies which provider it wants, but that's not the right way to use libfabric's fi_getinfo(3); what it should do instead is to specify in the hints the capabilities it needs/wants, and take the first (and by definition best) provider in the returned list, though probably with an env var override (https://github.com/Cray/chapel-private/issues/132)
  • nonblocking GET, PUT (really support nonblocking GET and PUT in comm=ofi #12391)
  • compute/communicate overlap, or task switching during comm completion waits
  • unordered atomics (https://github.com/Cray/chapel-private/issues/120)
  • we should auto-detect libfabric and, where needed, MPI (and which kind: OpenMPI or MPICH)
  • works with the verbs provider
  • AM executeOn message size is hard-limited to 10kb; need to add 'large' support for bigger ones
  • doesn't work on Mac OS X due to at least ASLR, possibly other things
  • with ASLR fixed, shows a possible CPU monopolization problem on Mac OS X (hello4 takes 105 secs, hpl times out)
  • figure out what the situation is with respect to network AMO coherence with processor AMOs and across network AMO classes (nonfetching, fetching, comparison)
  • adjust as needed to produce AMO coherence (https://github.com/Cray/chapel-private/issues/135)
  • based on fi_getinfo(3), we should be setting FI_LOCAL_COMM and FI_REMOTE_COMM in our hints
  • based on fi_getinfo(3), for best performance we should be setting mode flags we can support in our hints, and respecting mode flags that are passed back
  • it might be nice to print some info at higher verbosity levels (comm=ofi should be informative when verbosity>=2 #12474)
  • many-to-one test results indicate that at least for ofi-verbs on 16-node Cray CS, regular executeOns are faster than "fast" executeOns; why?
  • not directly a comm=ofi item, but the mpirun4ofi launcher needs some work:
    • it only works with OpenMPI mpirun; it should be able to use MPICH mpirun as well
    • it should have an option (env var) to force launched processes to be local and oversubscribed instead of using the enclosing slurm resources, if any
@gbtitus
Copy link
Member Author

gbtitus commented Feb 26, 2019

I just sent a question to libfabric-users@lists.openfabrics.org asking about libfabric and processor/network atomic operation non-coherence.

@gbtitus
Copy link
Member Author

gbtitus commented Feb 27, 2019

We have an authoritative response on atomic coherence. Basically, see the paragraph on visibility in the fi_atomic(3) man page (which is going to be updated for clarity, also). For the Chapel comm layer, which does not request target-side completions, atomic results are guaranteed to be visible when the associated completion is seen on the initiating side. Achieving coherence for a given location, when multiple target-side actors (NIC and CPU, or multiple NICs) are performing AMOs on that location, requires that all ops done by one actor must be visible (completions seen) before any ops are initiated by another actor.

An implication of this rule is that if the provider says it can do network AMOs for at least one kind of operation on a given location (non-fetching, fetching, or comparison) but it also cannot do network AMOs for at least one other kind of operation, then the comm layer needs to limit itself to non-network operations for all ops on that location. An easier version of this is that when we're deciding whether to do a network AMO or a processor AMO via AM, we have to choose the latter unless the provider says it can do all operations of concern to us using the network. Otherwise we can end up with one initiator doing one kind of AMO and another initiator doing a different kind of AMO, to the same location, using different techniques (network or processor-via-AM) but not synchronizing, thus breaking the coherence protocol.

@gbtitus
Copy link
Member Author

gbtitus commented Apr 26, 2022

I think one could make the argument that comm=ofi is "complete" in the sense that it passes testing on several different network architectures with different providers. Not everything in the description block has been done, but much of what's not done yet can be viewed as functionally good enough, or else a performance improvement. So, I'm removing myself as the assignee but leaving this open for the group to make the decision.

@gbtitus gbtitus removed their assignment Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant