Hi All,

just to expand on this guess ...

On Thu, Dec 02, 2010 at 05:40:53PM -0500, Gus Correa wrote:
> Hi All
> 
> I wonder if configuring OpenMPI while
> forcing the default types to non-default values
> (-fdefault-integer-8 -fdefault-real-8) might have
> something to do with the segmentation fault.
> Would this be effective, i.e., actually make the
> the sizes of MPI_INTEGER/MPI_INT and MPI_REAL/MPI_FLOAT bigger,
> or just elusive?

I believe what happens is that this mostly affects the fortran
wrapper routines and the way Fortran variables are mapped to C:

MPI_INTEGER -> MPI_LONG
MPI_FLOAT   -> MPI_DOUBLE
MPI_DOUBLE_PRECISION -> MPI_DOUBLE

In that respect I believe that the -fdefault-real-8 option is harmless,
i.e., it does the expected thing.
But the -fdefault-integer-8 options ought to be highly dangerous:
It works for integer variables that are used as "buffer" arguments
in MPI statements, but I would assume that this does not work for
"count" and similar arguments.
Example:

integer, allocatable :: buf(*,*)
integer i, count, dest, tag, mpierr

i = 32768
i2 = 2*i
allocate(buf(i,i2))
count = i*i2
buf = 1
call MPI_Send(buf, count, MPI_INTEGER, dest, tag, MPI_COMM_WORLD, mpierr)

Now count is 2^31 which overflows a 32bit integer.
The MPI standard requires that count is a 32bit integer, correct?
Thus while buf gets the type MPI_LONG, count remains an int.
Is this interpretation correct? If it is, then you are calling
MPI_Send with a count argument of -2147483648.
Which could result in a segmentation fault.

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid/ComputeCanada Site Lead
IT Services                                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: sieg...@sfu.ca
Canada  V5A 1S6

> There were some recent discussions here about MPI
> limiting counts to MPI_INTEGER.
> Since Benjamin said he "had to raise the number of data structures",
> which eventually led to the the error,
> I wonder if he is inadvertently flipping to negative integer
> side of the 32-bit universe (i.e. >= 2**31), as was reported here by
> other list subscribers a few times.
> 
> Anyway, segmentation fault can come from many different places,
> this is just a guess.
> 
> Gus Correa
> 
> Jeff Squyres wrote:
> >Do you get a corefile?
> >
> >It looks like you're calling MPI_RECV in Fortran and then it segv's.  This 
> >is *likely* because you're either passing a bad parameter or your buffer 
> >isn't big enough.  Can you double check all your parameters?
> >
> >Unfortunately, there's no line numbers printed in the stack trace, so it's 
> >not possible to tell exactly where in the ob1 PML it's dying (i.e., so we 
> >can't see exactly what it's doing to cause the segv).
> >
> >
> >
> >On Dec 2, 2010, at 9:36 AM, Benjamin Toueg wrote:
> >
> >>Hi,
> >>
> >>I am using DRAGON, a neutronic simulation code in FORTRAN77 that has its 
> >>own datastructures. I added a module to send these data structures thanks 
> >>to MPI_SEND / MPI_RECEIVE, and everything worked perfectly for a while.
> >>
> >>Then I had to raise the number of data structures to be sent up to a point 
> >>where my cluster has this bug :
> >>*** Process received signal ***
> >>Signal: Segmentation fault (11)
> >>Signal code: Address not mapped (1)
> >>Failing at address: 0x2c2579fc0
> >>[ 0] /lib/libpthread.so.0 [0x7f52d2930410]
> >>[ 1] /home/toueg/openmpi/lib/openmpi/mca_pml_ob1.so [0x7f52d153fe03]
> >>[ 2] /home/toueg/openmpi/lib/libmpi.so.0(PMPI_Recv+0x2d2) [0x7f52d3504a1e]
> >>[ 3] /home/toueg/openmpi/lib/libmpi_f77.so.0(pmpi_recv_+0x10e) 
> >>[0x7f52d36cf9c6]
> >>
> >>How can I make this error more explicit ?
> >>
> >>I use the following configuration of openmpi-1.4.3 :
> >>./configure --enable-debug --prefix=/home/toueg/openmpi CXX=g++ CC=gcc 
> >>F77=gfortran FC=gfortran FLAGS="-m64 -fdefault-integer-8 -fdefault-real-8 
> >>-fdefault-double-8" FCFLAGS="-m64 -fdefault-integer-8 -fdefault-real-8 
> >>-fdefault-double-8" --disable-mpi-f90
> >>
> >>Here is the output of mpif77 -v :
> >>mpif77 for 1.2.7 (release) of : 2005/11/04 11:54:51
> >>Driving: f77 -L/usr/lib/mpich-mpd/lib -v -lmpich-p4mpd -lpthread -lrt 
> >>-lfrtbegin -lg2c -lm -shared-libgcc
> >>Lecture des spécification à partir de 
> >>/usr/lib/gcc/x86_64-linux-gnu/3.4.6/specs
> >>Configuré avec: ../src/configure -v --enable-languages=c,c++,f77,pascal 
> >>--prefix=/usr --libexecdir=/usr/lib 
> >>--with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared 
> >>--with-system-zlib --enable-nls --without-included-gettext 
> >>--program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu 
> >>--enable-libstdcxx-debug x86_64-linux-gnu
> >>Modèle de thread: posix
> >>version gcc 3.4.6 (Debian 3.4.6-5)
> >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/collect2 --eh-frame-hdr -m elf_x86_64 
> >> -dynamic-linker /lib64/ld-linux-x86-64.so.2 
> >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crt1.o 
> >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crti.o 
> >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtbegin.o -L/usr/lib/mpich-mpd/lib 
> >> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 
> >> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 
> >> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib 
> >> -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../.. -L/lib/../lib 
> >> -L/usr/lib/../lib -lmpich-p4mpd -lpthread -lrt -lfrtbegin -lg2c -lm 
> >> -lgcc_s -lgcc -lc -lgcc_s -lgcc 
> >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtend.o 
> >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crtn.o
> >>/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/libfrtbegin.a(frtbegin.o):
> >> dans la fonction ▒ main ▒:
> >>(.text+0x1e): référence indéfinie vers ▒ MAIN__ ▒
> >>collect2: ld a retourné 1 code d'état d'exécution
> >>
> >>Thanks,
> >>Benjamin
> >>
> >>_______________________________________________
> >>users mailing list
> >>us...@open-mpi.org
> >>http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to