building openmpi with option "--without-memory-manager" fix my problem.
What does it exactly imply to compile with this option ? I guess all malloc use functions from libc instead of openmpi one, but does it have an effect on performance or something else ? Nicolas 2010/8/8 Nysal Jan <jny...@gmail.com> > What interconnect are you using? Infiniband? Use > "--without-memory-manager" option while building ompi in order to disable > ptmalloc. > > Regards > --Nysal > > > On Sun, Aug 8, 2010 at 7:49 PM, Nicolas Deladerriere < > nicolas.deladerri...@gmail.com> wrote: > >> Yes, I'am using 24G machine on 64 bit Linux OS. >> If I compile without wrapper, I did not get any problems. >> >> It seems that when I am linking with openmpi, my program use a kind of >> openmpi implemented malloc. Is it possible to switch it off in order ot only >> use malloc from libc ? >> >> Nicolas >> >> 2010/8/8 Terry Frankcombe <te...@chem.gu.se> >> >> You're trying to do a 6GB allocate. Can your underlying system handle >>> that? IF you compile without the wrapper, does it work? >>> >>> I see your executable is using the OMPI memory stuff. IIRC there are >>> switches to turn that off. >>> >>> >>> On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote: >>> > Hello, >>> > >>> > I'am having an sigsegv error when using simple program compiled and >>> > link with openmpi. >>> > I have reproduce the problem using really simple fortran code. It >>> > actually does not even use MPI, but just link with mpi shared >>> > libraries. (problem does not appear when I do not link with mpi >>> > libraries) >>> > % cat allocate.F90 >>> > program test >>> > implicit none >>> > integer, dimension(:), allocatable :: z >>> > integer(kind=8) :: l >>> > >>> > write(*,*) "l ?" >>> > read(*,*) l >>> > >>> > ALLOCATE(z(l)) >>> > z(1) = 111 >>> > z(l) = 222 >>> > DEALLOCATE(z) >>> > >>> > end program test >>> > >>> > I am using openmpi 1.4.2 and gfortran for my tests. Here is the >>> > compilation : >>> > >>> > % ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate >>> > allocate.F90 >>> > gfortran -g -o testallocate allocate.F90 >>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread >>> > -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib >>> > -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90 >>> > -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl >>> > -lutil -lm -ldl -pthread >>> > >>> > When I am running that test with different length, I sometimes get a >>> > "Segmentation fault" error. Here are two examples using two specific >>> > values, but error happens for many other values of length (I did not >>> > manage to find which values of lenght gives that error) >>> > >>> > % ./testallocate >>> > l ? >>> > 1600000000 >>> > Segmentation fault >>> > % ./testallocate >>> > l ? >>> > 2000000000 >>> > >>> > I used debugger with re-compiled version of openmpi using debug flag. >>> > I got the folowing error in function sYSMALLOc >>> > >>> > Program received signal SIGSEGV, Segmentation fault. >>> > 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, av=0x2aaaab930200) >>> > at malloc.c:3239 >>> > 3239 set_head(remainder, remainder_size | PREV_INUSE); >>> > Current language: auto; currently c >>> > (gdb) bt >>> > #0 0x00002aaaab70b3b3 in sYSMALLOc (nb=6400000016, >>> > av=0x2aaaab930200) at malloc.c:3239 >>> > #1 0x00002aaaab70d0db in opal_memory_ptmalloc2_int_malloc >>> > (av=0x2aaaab930200, bytes=6400000000) at malloc.c:4322 >>> > #2 0x00002aaaab70b773 in opal_memory_ptmalloc2_malloc >>> > (bytes=6400000000) at malloc.c:3435 >>> > #3 0x00002aaaab70a665 in opal_memory_ptmalloc2_malloc_hook >>> > (sz=6400000000, caller=0x2aaaabf8534d) at hooks.c:667 >>> > #4 0x00002aaaabf8534d in _gfortran_internal_free () >>> > from /usr/lib64/libgfortran.so.1 >>> > #5 0x0000000000400bcc in MAIN__ () at allocate.F90:11 >>> > #6 0x0000000000400c4e in main () >>> > (gdb) display >>> > (gdb) list >>> > 3234 if ((unsigned long)(size) >= (unsigned long)(nb + >>> > MINSIZE)) { >>> > 3235 remainder_size = size - nb; >>> > 3236 remainder = chunk_at_offset(p, nb); >>> > 3237 av->top = remainder; >>> > 3238 set_head(p, nb | PREV_INUSE | (av != &main_arena ? >>> > NON_MAIN_ARENA : 0)); >>> > 3239 set_head(remainder, remainder_size | PREV_INUSE); >>> > 3240 check_malloced_chunk(av, p, nb); >>> > 3241 return chunk2mem(p); >>> > 3242 } >>> > 3243 >>> > >>> > >>> > I also did the same test in C and I got the same problem. >>> > >>> > Does someone has any idea that could help me understand what's going >>> > on ? >>> > >>> > Regards >>> > Nicolas >>> > >>> > _______________________________________________ >>> > users mailing list >>> > us...@open-mpi.org >>> > http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >