I'm just curious, if we run an OpenMPI job and it makes use of non-local memory (i.e. memory tied to another socket) what kind of effects are seen on performance?
How would you go about testing the above? I can't think of any command line parameter that would allow one to split an OpenMPI process across sockets. I'd imagine it would be pretty bad since you can't cache non-local memory locally, the fact both the request and data have to flow through an IOH, the local CPU would have to compete w/the non-local CPU for access to its own memory and that doing this would have to implemented w/some sort of software semaphore locks (which would add even more overhead). Bill L. IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering it to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this information is strictly prohibited. Thank you for your cooperation.