Michael,

could you please share your test program so we can investigate it ?

Cheers,

Gilles

On 2014/10/31 18:53, michael.rach...@dlr.de wrote:
> Dear developers of OPENMPI,
>
> There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.
>
> But first: 
> Thank you for your advices to employ     shmem_mmap_relocate_backing_file = 1
> It indeed turned out, that the bad (but silent) allocations  by 
> MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
> allocated shared memory, 
> were indeed caused by  a too small available storage for the sharedmem 
> backing files. Applying the MCA parameter resolved the problem.
>
> Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
> OPENMPI-1.8.3 release version works on both clusters!
> I tested it both with my small sharedmem-Ftn-testprogram  as well as with our 
> Ftn-CFD-code.
> It worked  even when allocating 1000 shared data windows containing a total 
> of 40 GB.  Very well.
>
> But now I come to the problem remaining:
> According to the attached email of Jeff (see below) of 2014-10-24, 
> we have alternatively installed and tested the bugfixed OPENMPI Nightly 
> Tarball  of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
> That version worked well, when our CFD-code was running on only 1 node.
> But I observe now, that when running the CFD-code on 2 node with  2 processes 
> per node,
> after having allocated a total of 200 MB of data in 20 shared windows, the 
> allocation of the 21-th window fails, 
> because all 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The 
> code hangs in that routine, without any message.
>
> In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version  
>  with same program on same machine.
>
> That means for you:  
>    In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
> the shared memory allocation may be not yet correctly coded ,
>    or that version contains another new bug in sharedmemory allocation  
> compared to the working(!) 1.8.3-release version.
>
> Greetings to you all
>   Michael Rachner
>                                         
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
> (jsquyres)
> Gesendet: Freitag, 24. Oktober 2014 22:45
> An: Open MPI User's List
> Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
> memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code
>
> Nathan tells me that this may well be related to a fix that was literally 
> just pulled into the v1.8 branch today:
>
>     https://github.com/open-mpi/ompi-release/pull/56
>
> Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
> tarballs generated tonight will be the first ones to contain this fix)
>
>     http://www.open-mpi.org/nightly/master/
>
>
>
> On Oct 24, 2014, at 11:46 AM, <michael.rach...@dlr.de> 
> <michael.rach...@dlr.de> wrote:
>
>> Dear developers of OPENMPI,
>>  
>> I am running a small downsized Fortran-testprogram for shared memory 
>> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
>> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
>> Intel-14.0.4 /Intel-13.0.1, respectively.
>>  
>> The program simply allocates a sequence of shared data windows, each 
>> consisting of 1 integer*4-array.
>> None of the windows is freed, so the amount of allocated data  in shared 
>> windows raises during the course of the execution.
>>  
>> That worked well on the 1st cluster (Laki, having 8 procs per node))  
>> when allocating even 1000 shared windows each having 50000 integer*4 array 
>> elements, i.e. a total of  200 MBytes.
>> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on 
>> the login node, but it did NOT work on a compute node.
>> In that error case, there occurs something like an internal storage limit of 
>> ~ 140 MB for the total storage allocated in all shared windows.
>> When that limit is reached, all later shared memory allocations fail (but 
>> silently).
>> So the first attempt to use such a bad shared data window results in a bus 
>> error due to the bad storage address encountered.
>>  
>> That strange behavior could be observed in the small testprogram but also 
>> with my large Fortran CFD-code.
>> If the error occurs, then it occurs with both codes, and both at a storage 
>> limit of  ~140 MB.
>> I found that this storage limit depends only weakly on  the number of 
>> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 
>> 132.2 MB)
>>  
>> Note that the shared memory storage available on both clusters was very 
>> large (many GB of free memory).
>>  
>> Here is the error message when running with np=2 and an  array 
>> dimension of idim_1=50000  for the integer*4 array allocated per shared 
>> window on the compute node of Cluster5:
>> In that case, the error occurred at the 723-th shared window, which is the 
>> 1st badly allocated window in that case:
>> (722 successfully allocated shared windows * 50000 array elements * 4 
>> Bytes/el. = 144.4 MB)
>>  
>>  
>> [1,0]<stdout>: ========on nodemaster: iwin=         722 :
>> [1,0]<stdout>:  total storage [MByte] alloc. in shared windows so far:   
>> 144.400000000000
>> [1,0]<stdout>: =========== allocation of shared window no. iwin=         723
>> [1,0]<stdout>:  starting now with idim_1=       50000
>> [1,0]<stdout>: ========on nodemaster for iwin=         723 : before writing 
>> on shared mem
>> [1,0]<stderr>:[r5i5n13:12597] *** Process received signal *** 
>> [1,0]<stderr>:[r5i5n13:12597] Signal: Bus error (7) 
>> [1,0]<stderr>:[r5i5n13:12597] Signal code: Non-existant physical 
>> address (2) [1,0]<stderr>:[r5i5n13:12597] Failing at address: 
>> 0x7fffe08da000 [1,0]<stderr>:[r5i5n13:12597] [ 0] 
>> [1,0]<stderr>:/lib64/libpthread.so.0(+0xf800)[0x7ffff6d67800]
>> [1,0]<stderr>:[r5i5n13:12597] [ 1] ./a.out[0x408a8b] 
>> [1,0]<stderr>:[r5i5n13:12597] [ 2] ./a.out[0x40800c] 
>> [1,0]<stderr>:[r5i5n13:12597] [ 3] 
>> [1,0]<stderr>:/lib64/libc.so.6(__libc_start_main+0xe6)[0x7ffff69fec36]
>> [1,0]<stderr>:[r5i5n13:12597] [ 4] [1,0]<stderr>:./a.out[0x407f09] 
>> [1,0]<stderr>:[r5i5n13:12597] *** End of error message ***
>> [1,1]<stderr>:forrtl: error (78): process killed (SIGTERM)
>> [1,1]<stderr>:Image              PC                Routine            Line   
>>      Source
>> [1,1]<stderr>:libopen-pal.so.6   00007FFFF4B74580  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:libmpi.so.1        00007FFFF7267F3E  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:libmpi.so.1        00007FFFF733B555  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:libmpi.so.1        00007FFFF727DFFD  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:libmpi_mpifh.so.2  00007FFFF779BA03  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:a.out              0000000000408D15  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:a.out              000000000040800C  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:libc.so.6          00007FFFF69FEC36  Unknown               
>> Unknown  Unknown
>> [1,1]<stderr>:a.out              0000000000407F09  Unknown               
>> Unknown  Unknown
>> ----------------------------------------------------------------------
>> ---- mpiexec noticed that process rank 0 with PID 12597 on node 
>> r5i5n13 exited on signal 7 (Bus error).
>> ----------------------------------------------------------------------
>> ----
>>  
>>  
>> The small Ftn-testprogram was built by   
>>   mpif90 sharedmemtest.f90
>>   mpiexec -np 2 -bind-to core -tag-output ./a.out
>>  
>> Why does it work on the Laki  (both on login-node and on a compute 
>> node)  as well as on the login-node of Cluster5, but fails on an compute 
>> node of Cluster5?
>>  
>> Greetings
>>    Michael Rachner
>>  
>>  
>>  
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25572.php
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25580.php
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25654.php

Reply via email to