Le 28/09/2015 21:44, Dave Goodell (dgoodell) a écrit :
> It may have to do with NUMA effects and the way you're allocating/touching
> your shared memory vs. your private (malloced) memory. If you have a
> multi-NUMA-domain system (i.e., any 2+ socket server, and even some
> single-socket serv
Hi, Nathan
I have compiled 2.x with your patch. I must say it works _much_ better
with your changes. I have no idea how you figured that out! A short
table with my bandwidth calculations (MB/s)
PROT_READPROT_READ | PROT_WRITE
1.10.02500
There was a bug in that patch that affected IB systems. Updated patch:
https://github.com/hjelmn/ompi/commit/c53df23c0bcf8d1c531e04d22b96c8c19f9b3fd1.patch
-Nathan
On Tue, Sep 29, 2015 at 03:35:21PM -0600, Nathan Hjelm wrote:
>
> I have a branch with the changes available at:
>
> https://gith
I have a branch with the changes available at:
https://github.com/hjelmn/ompi.git
in the mpool_update branch. If you prefer you can apply this patch to
either a 2.x or a master tarball.
https://github.com/hjelmn/ompi/commit/8839dbfae85ba8f443b2857f9bbefdc36c4ebc1a.patch
Let me know if this res
I've now run a few more tests and I think I can reasonably confidently
say that the read only mmap is a problem. Let me know if you have a
possible fix - I will gladly test it.
Marcin
On 09/29/2015 04:59 PM, Nathan Hjelm wrote:
We register the memory with the NIC for both read and write ac
We register the memory with the NIC for both read and write access. This
may be the source of the slowdown. We recently added internal support to
allow the point-to-point layer to specify the access flags but the
openib btl does not yet make use of the new support. I plan to make the
necessary cha
Thanks, Dave.
I have verified the memory locality and IB card locality, all's fine.
Quite accidentally I have found that there is a huge penalty if I mmap
the shm with PROT_READ only. Using PROT_READ | PROT_WRITE yields good
results, although I must look at this further. I'll report when I am
On Sep 27, 2015, at 1:38 PM, marcin.krotkiewski
wrote:
>
> Hello, everyone
>
> I am struggling a bit with IB performance when sending data from a POSIX
> shared memory region (/dev/shm). The memory is shared among many MPI
> processes within the same compute node. Essentially, I see a bit hec
Hello, everyone
I am struggling a bit with IB performance when sending data from a POSIX
shared memory region (/dev/shm). The memory is shared among many MPI
processes within the same compute node. Essentially, I see a bit hectic
performance, but it seems that my code it is roughly twice slowe