On Thu, May 3, 2018 at 11:23 PM, Nathan Hjelm <hje...@me.com> wrote:

> Not saying we won't change the behavior. Just saying the user can't expect
> a particular alignment as there is no guarantee in the standard.  In Open
> MPI we just don't bother to align the pointer so right now so it naturally
> aligns as 64-bit. It isn't about wasting memory.
>

We should add an info key for alignment.  It's pretty silly we don't have
one already, given how windows are allocated.

At the very least, MPI-3 implementations should allocate all windows to
128b on x86_64 in order to allow MPI_Accumulate(MPI_COMPLEX_DOUBLE) to use
CMPXCHG16.

It's pretty lame for MPI_Alloc_mem to be worse than malloc.  We should fix
this in the standard.  MPI should not be breaking the ABI behavior when
substituted for the system allocator.


> Also remember that by default the shared memory regions are contiguous
> across local ranks so each rank will get a buffer alignment dictated by the
> sizes of the allocations specified by the prior ranks in addition to the
> zero rank buffer alignment.
>

If a user is allocating std::atomic<__int128>, every element will be
128b-aligned if the base is.  Noncontiguous is actually worse in that the
implementation could allocate the segment for each process with only 64b
alignment.

Jeff


> -Nathan
>
> On May 3, 2018, at 9:43 PM, Jeff Hammond <jeff.scie...@gmail.com> wrote:
>
> Given that this seems to break user experience on a relatively frequent
> basis, I’d like to know the compelling reason why MPI implementers aren’t
> willing to do something utterly trivial to fix it.
>
> And don’t tell me that 16B alignment wastes memory versus 8B alignment.
> Open-MPI “wastes” 4B relative to MPICH for every handle on I32LP64 systems.
> The internal state associated with MPI allocations - particularly windows -
> is bigger than 8B.  I recall ptmalloc uses something like 32B per heap
> allocation.
>
> Jeff
>
> On Thu, May 3, 2018 at 5:20 PM Nathan Hjelm <hje...@me.com> wrote:
>
>> That is probably it. When there are 4 ranks there are 4 int64’s just
>> before the user data (for PSCW). With 1 rank we don’t even bother, its just
>> malloc (16-byte aligned). With any other odd number of ranks the user data
>> is after an odd number of int64’s and is 8-byte aligned. There is no
>> requirement in MPI to provide 16-byte alignment (which is required for
>> _Atomic __int128 because of the alignment requirement of cmpxchg16b) so you
>> have to align it yourself.
>>
>> -Nathan
>>
>> > On May 3, 2018, at 2:16 PM, Joseph Schuchart <schuch...@hlrs.de> wrote:
>> >
>> > Martin,
>> >
>> > You say that you allocate shared memory, do you mean shared memory
>> windows? If so, this could be the reason: https://github.com/open-mpi/om
>> pi/issues/4952
>> >
>> > The alignment of memory allocated for MPI windows is not suitable for
>> 128bit values in Open MPI (only 8-byte alignment is guaranteed atm). I have
>> seen the alignment change depending on the number of processes. Could you
>> check the alignment of the memory you are trying to access?
>> >
>> > Cheers,
>> > Joseph
>> >
>> > On 05/03/2018 08:48 PM, Martin Böhm wrote:
>> >> Dear all,
>> >> I have a problem with a segfault on a user-built OpenMPI 3.0.1 running
>> on Ubuntu 16.04
>> >> in local mode (only processes on a single computer).
>> >> The problem manifests itself as a segfault when allocating shared
>> memory for
>> >> (at least) one 128-bit atomic variable (say std::atomic<__int128>) and
>> then
>> >> storing into this variable. This error only occurs when there are at
>> least two
>> >> processes, even though only the process of shared rank 0 is the one
>> doing both
>> >> the allocation and the writing. The compile flags -march=core2 or
>> -march=native
>> >> or -latomic were tried, none of which helped.
>> >> An example of the code that triggers it on my computers is this:
>> >> https://github.com/bohm/binstretch/blob/parallel-classic/
>> algorithm/bug.cpp
>> >> The code works fine with mpirun -np 1 and segfaults with mpirun -np 2,
>> 3 and 4;
>> >> if line 41 is commented out (the 128-bit atomic write), everything
>> works fine
>> >> with -np 2 or more.
>> >> As for Ubuntu's stock package containing OpenMPI 1.10.2, the code
>> segfaults with
>> >> "-np 2" and "-np 3" but not "-np 1" or "-np 4".
>> >> Thank you for any assistance concerning this problem. I would suspect
>> my own
>> >> code to be the most likely culprit, since it triggers on both the
>> stock package
>> >> and custom-built OpenMPI.
>> >> I attach the config.log.bz2 and ompi_info.log. Below I list some runs
>> of the program
>> >> and what errors are produced.
>> >> Thank you for any assistance. I have tried googling and searching the
>> mailing list
>> >> for this problem; if I missed something, I apologize.
>> >> Martin Böhm
>> >> ----- Ubuntu 16.04 stock mpirun and mpic++ -----
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 1
>> ../tests/bug
>> >> ex1 success.
>> >> ex2 success.
>> >> Inserted into ex1.
>> >> Inserted into ex2.
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 4
>> ../tests/bug
>> >> Thread 2: ex1 success.
>> >> Thread 3: ex1 success.
>> >> ex1 success.
>> >> Thread 1: ex1 success.
>> >> ex2 success.
>> >> Thread 2: ex2 success.
>> >> Thread 1: ex2 success.
>> >> Thread 3: ex2 success.
>> >> Inserted into ex1.
>> >> Inserted into ex2.
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/bin/mpirun -np 2
>> ../tests/bug
>> >> Thread 1: ex1 success.
>> >> ex1 success.
>> >> Thread 1: ex2 success.
>> >> ex2 success.
>> >> Inserted into ex1.
>> >> [kamenice:13662] *** Process received signal ***
>> >> [kamenice:13662] Signal: Segmentation fault (11)
>> >> [kamenice:13662] Signal code:  (128)
>> >> [kamenice:13662] Failing at address: (nil)
>> >> [kamenice:13662] [ 0] /lib/x86_64-linux-gnu/libc.so.
>> 6(+0x354b0)[0x7f31773844b0]
>> >> [kamenice:13662] [ 1] ../tests/bug[0x40d8ac]
>> >> [kamenice:13662] [ 2] ../tests/bug[0x408997]
>> >> [kamenice:13662] [ 3] ../tests/bug[0x408bf0]
>> >> [kamenice:13662] [ 4] /lib/x86_64-linux-gnu/libc.so.
>> 6(__libc_start_main+0xf0)[0x7f317736f830]
>> >> [kamenice:13662] [ 5] ../tests/bug[0x4086e9]
>> >> [kamenice:13662] *** End of error message ***
>> >> ----- Ubuntu 16.04 custom-compiled OpenMPI 3.0.1, installed to
>> /usr/local (the stock packages were uninstalled) -----
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 1
>> ../tests/bug
>> >> ex1 success.
>> >> ex2 success.
>> >> Inserted into ex1.
>> >> Inserted into ex2.
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 2
>> ../tests/bug
>> >> ex1 success.
>> >> Thread 1: ex1 success.
>> >> Inserted into ex1.
>> >> [kamenice:22794] *** Process received signal ***
>> >> ex2 success.
>> >> Thread 1: ex2 success.
>> >> [kamenice:22794] Signal: Segmentation fault (11)
>> >> [kamenice:22794] Signal code:  (128)
>> >> [kamenice:22794] Failing at address: (nil)
>> >> [kamenice:22794] [ 0] /lib/x86_64-linux-gnu/libc.so.
>> 6(+0x354b0)[0x7ff8bad084b0]
>> >> [kamenice:22794] [ 1] ../tests/bug[0x401010]
>> >> [kamenice:22794] [ 2] ../tests/bug[0x400d27]
>> >> [kamenice:22794] [ 3] ../tests/bug[0x400f80]
>> >> [kamenice:22794] [ 4] /lib/x86_64-linux-gnu/libc.so.
>> 6(__libc_start_main+0xf0)[0x7ff8bacf3830]
>> >> [kamenice:22794] [ 5] ../tests/bug[0x400a79]
>> >> [kamenice:22794] *** End of error message ***
>> >> -------------------------------------------------------
>> >> Primary job  terminated normally, but 1 process returned
>> >> a non-zero exit code. Per user-direction, the job has been aborted.
>> >> -------------------------------------------------------
>> >> ------------------------------------------------------------
>> --------------
>> >> mpirun noticed that process rank 0 with PID 0 on node kamenice exited
>> on signal 11 (Segmentation fault).
>> >> ------------------------------------------------------------
>> --------------
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 4
>> --oversubscribe ../tests/bug
>> >> ex1 success.
>> >> Thread 1: ex1 success.
>> >> Thread 2: ex1 success.
>> >> Thread 3: ex1 success.
>> >> ex2 success.
>> >> Thread 1: ex2 success.
>> >> Thread 2: ex2 success.
>> >> Thread 3: ex2 success.
>> >> Inserted into ex1.
>> >> [kamenice:22728] *** Process received signal ***
>> >> [kamenice:22728] Signal: Segmentation fault (11)
>> >> [kamenice:22728] Signal code:  (128)
>> >> [kamenice:22728] Failing at address: (nil)
>> >> [kamenice:22728] [ 0] /lib/x86_64-linux-gnu/libc.so.
>> 6(+0x354b0)[0x7f826a6294b0]
>> >> [kamenice:22728] [ 1] ../tests/bug[0x401010]
>> >> [kamenice:22728] [ 2] ../tests/bug[0x400d27]
>> >> [kamenice:22728] [ 3] ../tests/bug[0x400f80]
>> >> [kamenice:22728] [ 4] /lib/x86_64-linux-gnu/libc.so.
>> 6(__libc_start_main+0xf0)[0x7f826a614830]
>> >> [kamenice:22728] [ 5] ../tests/bug[0x400a79]
>> >> [kamenice:22728] *** End of error message ***
>> >> -------------------------------------------------------
>> >> Primary job  terminated normally, but 1 process returned
>> >> a non-zero exit code. Per user-direction, the job has been aborted.
>> >> -------------------------------------------------------
>> >> ------------------------------------------------------------
>> --------------
>> >> mpirun noticed that process rank 0 with PID 0 on node kamenice exited
>> on signal 11 (Segmentation fault).
>> >> ------------------------------------------------------------
>> --------------
>> >> bohm@kamenice:~/cl/w/b/classic/algorithm$ /usr/local/bin/mpirun -np 2
>> valgrind ../tests/bug
>> >> ==22814== Memcheck, a memory error detector
>> >> ==22814== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et
>> al.
>> >> ==22814== Using Valgrind-3.11.0 and LibVEX; rerun with -h for
>> copyright info
>> >> ==22814== Command: ../tests/bug
>> >> ==22814==
>> >> ==22815== Memcheck, a memory error detector
>> >> ==22815== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et
>> al.
>> >> ==22815== Using Valgrind-3.11.0 and LibVEX; rerun with -h for
>> copyright info
>> >> ==22815== Command: ../tests/bug
>> >> ==22815==
>> >> ex1 success.
>> >> Thread 1: ex1 success.
>> >> Thread 1: ex2 success.
>> >> ex2 success.
>> >> Inserted into ex1.
>> >> [kamenice:22814] *** Process received signal ***
>> >> [kamenice:22814] Signal: Segmentation fault (11)
>> >> [kamenice:22814] Signal code:  (128)
>> >> [kamenice:22814] Failing at address: (nil)
>> >> [kamenice:22814] [ 0] /lib/x86_64-linux-gnu/libc.so.
>> 6(+0x354b0)[0x51704b0]
>> >> [kamenice:22814] [ 1] ../tests/bug[0x401010]
>> >> [kamenice:22814] [ 2] ../tests/bug[0x400d27]
>> >> [kamenice:22814] [ 3] ../tests/bug[0x400f80]
>> >> [kamenice:22814] [ 4] /lib/x86_64-linux-gnu/libc.so.
>> 6(__libc_start_main+0xf0)[0x515b830]
>> >> [kamenice:22814] [ 5] ../tests/bug[0x400a79]
>> >> [kamenice:22814] *** End of error message ***
>> >> ==22814==
>> >> ==22814== Process terminating with default action of signal 11
>> (SIGSEGV)
>> >> ==22814==    at 0x5170428: raise (raise.c:54)
>> >> ==22814==    by 0x5827E4D: show_stackframe (in
>> /usr/local/lib/libopen-pal.so.40.1.0)
>> >> ==22814==    by 0x51704AF: ??? (in /lib/x86_64-linux-gnu/libc-2.23.so)
>> >> ==22814==    by 0x40100F: std::atomic<__int128>::store(__int128,
>> std::memory_order) (atomic:225)
>> >> ==22814==    by 0x400D26: shared_memory_init(int, int) (bug.cpp:41)
>> >> ==22814==    by 0x400F7F: main (bug.cpp:80)
>> >> ==22814==
>> >> ==22814== HEAP SUMMARY:
>> >> ==22814==     in use at exit: 2,759,989 bytes in 9,014 blocks
>> >> ==22814==   total heap usage: 20,168 allocs, 11,154 frees, 3,820,707
>> bytes allocated
>> >> ==22814==
>> >> ==22814== LEAK SUMMARY:
>> >> ==22814==    definitely lost: 12 bytes in 1 blocks
>> >> ==22814==    indirectly lost: 0 bytes in 0 blocks
>> >> ==22814==      possibly lost: 608 bytes in 2 blocks
>> >> ==22814==    still reachable: 2,759,369 bytes in 9,011 blocks
>> >> ==22814==         suppressed: 0 bytes in 0 blocks
>> >> ==22814== Rerun with --leak-check=full to see details of leaked memory
>> >> ==22814==
>> >> ==22814== For counts of detected and suppressed errors, rerun with: -v
>> >> ==22814== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from
>> 0)
>> >> -------------------------------------------------------
>> >> Primary job  terminated normally, but 1 process returned
>> >> a non-zero exit code. Per user-direction, the job has been aborted.
>> >> -------------------------------------------------------
>> >> ==22815==
>> >> ==22815== Process terminating with default action of signal 15
>> (SIGTERM)
>> >> ==22815==    at 0x523674D: ??? (syscall-template.S:84)
>> >> ==22815==    by 0x583B4A7: poll (poll2.h:46)
>> >> ==22815==    by 0x583B4A7: poll_dispatch (poll.c:165)
>> >> ==22815==    by 0x5831BDE: opal_libevent2022_event_base_loop
>> (event.c:1630)
>> >> ==22815==    by 0x57F210D: progress_engine (in
>> /usr/local/lib/libopen-pal.so.40.1.0)
>> >> ==22815==    by 0x5FD76B9: start_thread (pthread_create.c:333)
>> >> ==22815==    by 0x524241C: clone (clone.S:109)
>> >> ==22815==
>> >> ==22815== HEAP SUMMARY:
>> >> ==22815==     in use at exit: 2,766,405 bytes in 9,017 blocks
>> >> ==22815==   total heap usage: 20,167 allocs, 11,150 frees, 3,823,751
>> bytes allocated
>> >> ==22815==
>> >> ==22815== LEAK SUMMARY:
>> >> ==22815==    definitely lost: 12 bytes in 1 blocks
>> >> ==22815==    indirectly lost: 0 bytes in 0 blocks
>> >> ==22815==      possibly lost: 608 bytes in 2 blocks
>> >> ==22815==    still reachable: 2,765,785 bytes in 9,014 blocks
>> >> ==22815==         suppressed: 0 bytes in 0 blocks
>> >> ==22815== Rerun with --leak-check=full to see details of leaked memory
>> >> ==22815==
>> >> ==22815== For counts of detected and suppressed errors, rerun with: -v
>> >> ==22815== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from
>> 0)
>> >> ------------------------------------------------------------
>> --------------
>> >> mpirun noticed that process rank 0 with PID 0 on node kamenice exited
>> on signal 9 (Killed).
>> >> ------------------------------------------------------------
>> --------------
>> >> _______________________________________________
>> >> users mailing list
>> >> users@lists.open-mpi.org
>> >> https://lists.open-mpi.org/mailman/listinfo/users
>> >
>> >
>> > --
>> > Dipl.-Inf. Joseph Schuchart
>> > High Performance Computing Center Stuttgart (HLRS)
>> > Nobelstr. 19
>> > D-70569 Stuttgart
>> >
>> > Tel.: +49(0)711-68565890
>> > Fax: +49(0)711-6856832
>> > E-Mail: schuch...@hlrs.de
>> > _______________________________________________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> --
> Jeff Hammond
> jeff.scie...@gmail.com
> http://jeffhammond.github.io/
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>



-- 
Jeff Hammond
jeff.scie...@gmail.com
http://jeffhammond.github.io/
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to