Dear developers of OPENMPI,

There remains a hanging observed in MPI_WIN_ALLOCATE_SHARED.

But first: 
Thank you for your advices to employ     shmem_mmap_relocate_backing_file = 1
It indeed turned out, that the bad (but silent) allocations  by 
MPI_WIN_ALLOCATE_SHARED, which I observed in the past after ~140 MB of 
allocated shared memory, 
were indeed caused by  a too small available storage for the sharedmem backing 
files. Applying the MCA parameter resolved the problem.

Now the allocation of shared data windows by  MPI_WIN_ALLOCATE_SHARED in the 
OPENMPI-1.8.3 release version works on both clusters!
I tested it both with my small sharedmem-Ftn-testprogram  as well as with our 
Ftn-CFD-code.
It worked  even when allocating 1000 shared data windows containing a total of 
40 GB.  Very well.

But now I come to the problem remaining:
According to the attached email of Jeff (see below) of 2014-10-24, 
we have alternatively installed and tested the bugfixed OPENMPI Nightly Tarball 
 of 2014-10-24  (openmpi-dev-176-g9334abc.tar.gz) on Cluster5 .
That version worked well, when our CFD-code was running on only 1 node.
But I observe now, that when running the CFD-code on 2 node with  2 processes 
per node,
after having allocated a total of 200 MB of data in 20 shared windows, the 
allocation of the 21-th window fails, 
because all 4 processes enter MPI_WIN_ALLOCATE_SHARED but never leave it. The 
code hangs in that routine, without any message.

In contrast, that bug does NOT occur with the  OPENMPI-1.8.3 release version   
with same program on same machine.

That means for you:  
   In openmpi-dev-176-g9334abc.tar.gz   the new-introduced  bugfix concerning 
the shared memory allocation may be not yet correctly coded ,
   or that version contains another new bug in sharedmemory allocation  
compared to the working(!) 1.8.3-release version.

Greetings to you all
  Michael Rachner
                                        



-----Ursprüngliche Nachricht-----
Von: users [mailto:users-boun...@open-mpi.org] Im Auftrag von Jeff Squyres 
(jsquyres)
Gesendet: Freitag, 24. Oktober 2014 22:45
An: Open MPI User's List
Betreff: Re: [OMPI users] Bug in OpenMPI-1.8.3: storage limition in shared 
memory allocation (MPI_WIN_ALLOCATE_SHARED) in Ftn-code

Nathan tells me that this may well be related to a fix that was literally just 
pulled into the v1.8 branch today:

    https://github.com/open-mpi/ompi-release/pull/56

Would you mind testing any nightly tarball after tonight?  (i.e., the v1.8 
tarballs generated tonight will be the first ones to contain this fix)

    http://www.open-mpi.org/nightly/master/



On Oct 24, 2014, at 11:46 AM, <michael.rach...@dlr.de> <michael.rach...@dlr.de> 
wrote:

> Dear developers of OPENMPI,
>  
> I am running a small downsized Fortran-testprogram for shared memory 
> allocation (using MPI_WIN_ALLOCATE_SHARED and  MPI_WIN_SHARED_QUERY) )
> on only 1 node   of 2 different Linux-clusters with OPENMPI-1.8.3 and 
> Intel-14.0.4 /Intel-13.0.1, respectively.
>  
> The program simply allocates a sequence of shared data windows, each 
> consisting of 1 integer*4-array.
> None of the windows is freed, so the amount of allocated data  in shared 
> windows raises during the course of the execution.
>  
> That worked well on the 1st cluster (Laki, having 8 procs per node))  
> when allocating even 1000 shared windows each having 50000 integer*4 array 
> elements, i.e. a total of  200 MBytes.
> On the 2nd cluster (Cluster5, having 24 procs per node) it also worked on the 
> login node, but it did NOT work on a compute node.
> In that error case, there occurs something like an internal storage limit of 
> ~ 140 MB for the total storage allocated in all shared windows.
> When that limit is reached, all later shared memory allocations fail (but 
> silently).
> So the first attempt to use such a bad shared data window results in a bus 
> error due to the bad storage address encountered.
>  
> That strange behavior could be observed in the small testprogram but also 
> with my large Fortran CFD-code.
> If the error occurs, then it occurs with both codes, and both at a storage 
> limit of  ~140 MB.
> I found that this storage limit depends only weakly on  the number of 
> processes (for np=2,4,8,16,24  it is: 144.4 , 144.0, 141.0, 137.0, 
> 132.2 MB)
>  
> Note that the shared memory storage available on both clusters was very large 
> (many GB of free memory).
>  
> Here is the error message when running with np=2 and an  array 
> dimension of idim_1=50000  for the integer*4 array allocated per shared 
> window on the compute node of Cluster5:
> In that case, the error occurred at the 723-th shared window, which is the 
> 1st badly allocated window in that case:
> (722 successfully allocated shared windows * 50000 array elements * 4 
> Bytes/el. = 144.4 MB)
>  
>  
> [1,0]<stdout>: ========on nodemaster: iwin=         722 :
> [1,0]<stdout>:  total storage [MByte] alloc. in shared windows so far:   
> 144.400000000000
> [1,0]<stdout>: =========== allocation of shared window no. iwin=         723
> [1,0]<stdout>:  starting now with idim_1=       50000
> [1,0]<stdout>: ========on nodemaster for iwin=         723 : before writing 
> on shared mem
> [1,0]<stderr>:[r5i5n13:12597] *** Process received signal *** 
> [1,0]<stderr>:[r5i5n13:12597] Signal: Bus error (7) 
> [1,0]<stderr>:[r5i5n13:12597] Signal code: Non-existant physical 
> address (2) [1,0]<stderr>:[r5i5n13:12597] Failing at address: 
> 0x7fffe08da000 [1,0]<stderr>:[r5i5n13:12597] [ 0] 
> [1,0]<stderr>:/lib64/libpthread.so.0(+0xf800)[0x7ffff6d67800]
> [1,0]<stderr>:[r5i5n13:12597] [ 1] ./a.out[0x408a8b] 
> [1,0]<stderr>:[r5i5n13:12597] [ 2] ./a.out[0x40800c] 
> [1,0]<stderr>:[r5i5n13:12597] [ 3] 
> [1,0]<stderr>:/lib64/libc.so.6(__libc_start_main+0xe6)[0x7ffff69fec36]
> [1,0]<stderr>:[r5i5n13:12597] [ 4] [1,0]<stderr>:./a.out[0x407f09] 
> [1,0]<stderr>:[r5i5n13:12597] *** End of error message ***
> [1,1]<stderr>:forrtl: error (78): process killed (SIGTERM)
> [1,1]<stderr>:Image              PC                Routine            Line    
>     Source
> [1,1]<stderr>:libopen-pal.so.6   00007FFFF4B74580  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:libmpi.so.1        00007FFFF7267F3E  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:libmpi.so.1        00007FFFF733B555  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:libmpi.so.1        00007FFFF727DFFD  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:libmpi_mpifh.so.2  00007FFFF779BA03  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:a.out              0000000000408D15  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:a.out              000000000040800C  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:libc.so.6          00007FFFF69FEC36  Unknown               
> Unknown  Unknown
> [1,1]<stderr>:a.out              0000000000407F09  Unknown               
> Unknown  Unknown
> ----------------------------------------------------------------------
> ---- mpiexec noticed that process rank 0 with PID 12597 on node 
> r5i5n13 exited on signal 7 (Bus error).
> ----------------------------------------------------------------------
> ----
>  
>  
> The small Ftn-testprogram was built by   
>   mpif90 sharedmemtest.f90
>   mpiexec -np 2 -bind-to core -tag-output ./a.out
>  
> Why does it work on the Laki  (both on login-node and on a compute 
> node)  as well as on the login-node of Cluster5, but fails on an compute node 
> of Cluster5?
>  
> Greetings
>    Michael Rachner
>  
>  
>  
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25572.php


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25580.php

Reply via email to