Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Jeff Squyres (jsquyres) Wed, 5 Nov 2014 07:48:38 -0500 (EST)

Yes, I would love to have a copy of that test program, if you could share it.  
I'll add it to our internal test suite.



On Nov 5, 2014, at 5:08 AM, <michael.rach...@dlr.de> <michael.rach...@dlr.de> 
wrote:

> Dear Gilles,
> 
> My small downsized Ftn-testprogram for testing the shared memory  feature 
> (MPI_WIN_ALLOCATE_SHARED,  MPI_WIN_SHARED_QUERY, C_F_POINTER)
> presumes for simplicity that all processes are running on the same node (i.e. 
> the communicator containing the procs on the same node  is just 
> MPI_COMM_WORLD).
> So the hanging of MPI_WIN_ALLOCATE_SHARED when running on 2 nodes could only 
> be observed with our large CFD-code. 
> 
> Are OPENMPI-developers nevertheless interested in that testprogram?
> 
> Greetings
> Michael
> 
> 
> 
> 
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Gilles 
> Gouaillardet
> Gesendet: Mittwoch, 5. November 2014 10:46
> An: Open MPI Users
> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
> memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
> 
> Michael,
> 
> could you please share your test program so we can investigate it ?
> 
> Cheers,
> 
> Gilles
> 
> On 2014/10/31 18:53, michael.rach...@dlr.de wrote:
>> Dear developers of OPENMPI,
>> 
>> There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
>> 
>> But first: 
>> Thank you for your advices to employ     shmem_mmap_relocate_backing_file = 1
>> It indeed turned out, that the bad (but silent) allocations  by 
>> MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
>> allocated shared memory, were indeed caused by  a too small available 
>> storage for the sharedmem backing files. Applying the MCA parameter resolved 
>> the problem.
>> 
>> Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
>> OPENMPI-1.8.3 release version works on both clusters!
>> I tested it both with my small sharedmem-Ftn-testprogram  as well as with 
>> our Ftn-CFD-code.
>> It worked  even when allocating 1000 shared data windows containing a total 
>> of 40 GB.  Very well.
>> 
>> But now I come to the problem remaining:
>> According to the attached email of Jeff (see below) of 2014-10-24, we 
>> have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball 
>>  of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
>> That version worked well, when our CFD-code was running on only 1 node.
>> But I observe now, that when running the CFD-code on 2 node with  2 
>> processes per node, after having allocated a total of 200 MB of data 
>> in 20 shared windows, the allocation of the 21-th window fails, because all 
>> 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The code hangs 
>> in that routine, without any message.
>> 
>> In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version 
>>   with same program on same machine.
>> 
>> That means for you:  
>>   In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
>> the shared memory allocation may be not yet correctly coded ,
>>   or that version contains another new bug in sharedmemory allocation  
>> compared to the working(!) 1.8.3-release version.
>> 
>> Greetings to you all
>>  Michael Rachner
>> 
>> 
>> 
>> 
>> -----Ursprüngliche Nachricht-----
>> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff 
>> Squyres (jsquyres)
>> Gesendet: Freitag, 24. Oktober 2014 22:45
>> An: Open MPI User's List
>> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in 
>> shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>> 
>> Nathan tells me that this may well be related to a fix that was literally 
>> just pulled into the v1.8 branch today:
>> 
>>    https://github.com/open-mpi/ompi-release/pull/56
>> 
>> Would you mind testing any nightly tarball after tonight?  (i.e., the 
>> v1.8 tarballs generated tonight will be the first ones to contain this 
>> fix)
>> 
>>    http://www.open-mpi.org/nightly/master/
>> 
>> 
>> 
>> On Oct 24, 2014, at 11:46 AM, <michael.rach...@dlr.de> 
>> <michael.rach...@dlr.de> wrote:
>> 
>>> Dear developers of OPENMPI,
>>> 
>>> I am running a small downsized Fortran-testprogram for shared memory 
>>> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
>>> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
>>> Intel-14.0.4 /Intel-13.0.1, respectively.
>>> 
>>> The program simply allocates a sequence of shared data windows, each 
>>> consisting of 1 integer*4-array.
>>> None of the windows is freed, so the amount of allocated data  in shared 
>>> windows raises during the course of the execution.
>>> 
>>> That worked well on the 1st cluster (Laki, having 8 procs per node)) 
>>> when allocating even 1000 shared windows each having 50000 integer*4 array 
>>> elements, i.e. a total of  200 MBytes.
>>> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on 
>>> the login node, but it did NOT work on a compute node.
>>> In that error case, there occurs something like an internal storage limit 
>>> of ~ 140 MB for the total storage allocated in all shared windows.
>>> When that limit is reached, all later shared memory allocations fail (but 
>>> silently).
>>> So the first attempt to use such a bad shared data window results in a bus 
>>> error due to the bad storage address encountered.
>>> 
>>> That strange behavior could be observed in the small testprogram but also 
>>> with my large Fortran CFD-code.
>>> If the error occurs, then it occurs with both codes, and both at a storage 
>>> limit of  ~140 MB.
>>> I found that this storage limit depends only weakly on  the number of 
>>> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0,
>>> 132.2 MB)
>>> 
>>> Note that the shared memory storage available on both clusters was very 
>>> large (many GB of free memory).
>>> 
>>> Here is the error message when running with np=2 and an  array 
>>> dimension of idim_1=50000  for the integer*4 array allocated per shared 
>>> window on the compute node of Cluster5:
>>> In that case, the error occurred at the 723-th shared window, which is the 
>>> 1st badly allocated window in that case:
>>> (722 successfully allocated shared windows * 50000 array elements * 4 
>>> Bytes/el. = 144.4 MB)
>>> 
>>> 
>>> [1,0]<stdout>: ========on nodemaster: iwin=         722 :
>>> [1,0]<stdout>:  total storage [MByte] alloc. in shared windows so far:   
>>> 144.400000000000
>>> [1,0]<stdout>: =========== allocation of shared window no. iwin=         723
>>> [1,0]<stdout>:  starting now with idim_1=       50000
>>> [1,0]<stdout>: ========on nodemaster for iwin=         723 : before writing 
>>> on shared mem
>>> [1,0]<stderr>:[r5i5n13:12597] *** Process received signal *** 
>>> [1,0]<stderr>:[r5i5n13:12597] Signal: Bus error (7) 
>>> [1,0]<stderr>:[r5i5n13:12597] Signal code: Non-existant physical 
>>> address (2) [1,0]<stderr>:[r5i5n13:12597] Failing at address:
>>> 0x7fffe08da000 [1,0]<stderr>:[r5i5n13:12597] [ 0] 
>>> [1,0]<stderr>:/lib64/libpthread.so.0(+0xf800)[0x7ffff6d67800]
>>> [1,0]<stderr>:[r5i5n13:12597] [ 1] ./a.out[0x408a8b] 
>>> [1,0]<stderr>:[r5i5n13:12597] [ 2] ./a.out[0x40800c] 
>>> [1,0]<stderr>:[r5i5n13:12597] [ 3] 
>>> [1,0]<stderr>:/lib64/libc.so.6(__libc_start_main+0xe6)[0x7ffff69fec36
>>> ] [1,0]<stderr>:[r5i5n13:12597] [ 4] [1,0]<stderr>:./a.out[0x407f09] 
>>> [1,0]<stderr>:[r5i5n13:12597] *** End of error message ***
>>> [1,1]<stderr>:forrtl: error (78): process killed (SIGTERM)
>>> [1,1]<stderr>:Image              PC                Routine            Line  
>>>       Source
>>> [1,1]<stderr>:libopen-pal.so.6   00007FFFF4B74580  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:libmpi.so.1        00007FFFF7267F3E  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:libmpi.so.1        00007FFFF733B555  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:libmpi.so.1        00007FFFF727DFFD  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:libmpi_mpifh.so.2  00007FFFF779BA03  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:a.out              0000000000408D15  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:a.out              000000000040800C  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:libc.so.6          00007FFFF69FEC36  Unknown               
>>> Unknown  Unknown
>>> [1,1]<stderr>:a.out              0000000000407F09  Unknown               
>>> Unknown  Unknown
>>> ---------------------------------------------------------------------
>>> -
>>> ---- mpiexec noticed that process rank 0 with PID 12597 on node
>>> r5i5n13 exited on signal 7 (Bus error).
>>> ---------------------------------------------------------------------
>>> -
>>> ----
>>> 
>>> 
>>> The small Ftn-testprogram was built by   
>>>  mpif90 sharedmemtest.f90
>>>  mpiexec -np 2 -bind-to core -tag-output ./a.out
>>> 
>>> Why does it work on the Laki  (both on login-node and on a compute
>>> node)  as well as on the login-node of Cluster5, but fails on an compute 
>>> node of Cluster5?
>>> 
>>> Greetings
>>>   Michael Rachner
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/10/25572.php
>> 
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25580.php
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25654.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25677.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/11/25678.php


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Reply via email to