Unfortunately DRAGON is old FORTRAN77. Integers have been used instead of pointers. If I compile it in 64bits without -f-default-integer-8, the so-called pointers will remain in 32bits. Problems could also arise from its data structure handlers.
Therefore -f-default-integer-8 is absolutely necessary. Futhermore MPI_SEND and MPI_RECEIVE are called a dozen times in only one source file (used for passing a data structure from one node to another) and it has proved to be working in every situtation. Not knowing which line is causing my segfault is annoying. [?] Regards, Benjamin 2010/12/6 Gustavo Correa <g...@ldeo.columbia.edu> > Hi Benjamin > > I would just rebuild OpenMPI withOUT the compiler flags that change the > standard > sizes of "int" and "float" (do a "make cleandist" first!), then recompile > your program, > and see how it goes. > I don't think you are gaining anything by trying to change the standard > "int/integer" and > "real/float" sizdes, and most likely they are inviting trouble, making > things more confusing. > Worst scenario, you will at least be sure that the bug is somewhere else, > not on the mismatch > of basic type sizes. > > If you need to pass 8-byte real buffers, use MPI_DOUBLE_PRECISION, or > MPI_REAL8 > in your (Fortran) MPI calls, and declare them in the Fortran code > accordingly > (double precision or real(kind=8)). > > If I remember right, there is no 8-byte integer support in the Fortran MPI > bindings, > only in the C bindings, but some OpenMPI expert could clarify this. > Hence, if you are passing 8-byte integers in your MPI calls this may be > also problematic. > > My two cents, > Gus Correa > > On Dec 5, 2010, at 3:04 PM, Benjamin Toueg wrote: > > > Hi, > > > > First of all thanks for your insight ! > > > > Do you get a corefile? > > I don't get a core file, but I get a file called _FIL001. It doesn't > contain any debugging symbols. It's most likely a digested version of the > input file given to the executable : ./myexec < inputfile. > > > > there's no line numbers printed in the stack trace > > I would love to see those, but even if I compile openmpi with -debug > -mem-debug -mem-profile, they don't show up. I recompiled my sources to be > sure to properly link them to the newly debugged version of openmpi. I > assumed I didn't need to compile my own sources with -g option since it > crashes in openmpi itself ? I didn't try to run mpiexec via gdb either, I > guess it wont help since I already get the trace. > > > > the -fdefault-integer-8 options ought to be highly dangerous > > Thanks for noting. Indeed I had some issues with this option. For > instance I have to declare some arguments as INTEGER*4 like RANK,SIZE,IERR > in : > > CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR) > > CALL MPI_COMM_SIZE(MPI_COMM_WORLD,SIZE,IERR) > > In your example "call MPI_Send(buf, count, MPI_INTEGER, dest, tag, > MPI_COMM_WORLD, mpierr)" I checked that count is never bigger than 2000 (as > you mentioned it could flip to the negative). However I haven't declared it > as INTEGER*4 and I think I should. > > When I said "I had to raise the number of data strucutures to be sent", I > meant that I had to call MPI_SEND many more times, not that buffers were > bigger than before. > > > > I'll get back to you with more info when I'll be able to fix my connexion > problem to the cluster... > > > > Thanks, > > Benjamin > > > > 2010/12/3 Martin Siegert <sieg...@sfu.ca> > > Hi All, > > > > just to expand on this guess ... > > > > On Thu, Dec 02, 2010 at 05:40:53PM -0500, Gus Correa wrote: > > > Hi All > > > > > > I wonder if configuring OpenMPI while > > > forcing the default types to non-default values > > > (-fdefault-integer-8 -fdefault-real-8) might have > > > something to do with the segmentation fault. > > > Would this be effective, i.e., actually make the > > > the sizes of MPI_INTEGER/MPI_INT and MPI_REAL/MPI_FLOAT bigger, > > > or just elusive? > > > > I believe what happens is that this mostly affects the fortran > > wrapper routines and the way Fortran variables are mapped to C: > > > > MPI_INTEGER -> MPI_LONG > > MPI_FLOAT -> MPI_DOUBLE > > MPI_DOUBLE_PRECISION -> MPI_DOUBLE > > > > In that respect I believe that the -fdefault-real-8 option is harmless, > > i.e., it does the expected thing. > > But the -fdefault-integer-8 options ought to be highly dangerous: > > It works for integer variables that are used as "buffer" arguments > > in MPI statements, but I would assume that this does not work for > > "count" and similar arguments. > > Example: > > > > integer, allocatable :: buf(*,*) > > integer i, count, dest, tag, mpierr > > > > i = 32768 > > i2 = 2*i > > allocate(buf(i,i2)) > > count = i*i2 > > buf = 1 > > call MPI_Send(buf, count, MPI_INTEGER, dest, tag, MPI_COMM_WORLD, mpierr) > > > > Now count is 2^31 which overflows a 32bit integer. > > The MPI standard requires that count is a 32bit integer, correct? > > Thus while buf gets the type MPI_LONG, count remains an int. > > Is this interpretation correct? If it is, then you are calling > > MPI_Send with a count argument of -2147483648. > > Which could result in a segmentation fault. > > > > Cheers, > > Martin > > > > -- > > Martin Siegert > > Head, Research Computing > > WestGrid/ComputeCanada Site Lead > > IT Services phone: 778 782-4691 > > Simon Fraser University fax: 778 782-4242 > > Burnaby, British Columbia email: sieg...@sfu.ca > > Canada V5A 1S6 > > > > > There were some recent discussions here about MPI > > > limiting counts to MPI_INTEGER. > > > Since Benjamin said he "had to raise the number of data structures", > > > which eventually led to the the error, > > > I wonder if he is inadvertently flipping to negative integer > > > side of the 32-bit universe (i.e. >= 2**31), as was reported here by > > > other list subscribers a few times. > > > > > > Anyway, segmentation fault can come from many different places, > > > this is just a guess. > > > > > > Gus Correa > > > > > > Jeff Squyres wrote: > > > >Do you get a corefile? > > > > > > > >It looks like you're calling MPI_RECV in Fortran and then it segv's. > This is *likely* because you're either passing a bad parameter or your > buffer isn't big enough. Can you double check all your parameters? > > > > > > > >Unfortunately, there's no line numbers printed in the stack trace, so > it's not possible to tell exactly where in the ob1 PML it's dying (i.e., so > we can't see exactly what it's doing to cause the segv). > > > > > > > > > > > > > > > >On Dec 2, 2010, at 9:36 AM, Benjamin Toueg wrote: > > > > > > > >>Hi, > > > >> > > > >>I am using DRAGON, a neutronic simulation code in FORTRAN77 that has > its own datastructures. I added a module to send these data structures > thanks to MPI_SEND / MPI_RECEIVE, and everything worked perfectly for a > while. > > > >> > > > >>Then I had to raise the number of data structures to be sent up to a > point where my cluster has this bug : > > > >>*** Process received signal *** > > > >>Signal: Segmentation fault (11) > > > >>Signal code: Address not mapped (1) > > > >>Failing at address: 0x2c2579fc0 > > > >>[ 0] /lib/libpthread.so.0 [0x7f52d2930410] > > > >>[ 1] /home/toueg/openmpi/lib/openmpi/mca_pml_ob1.so [0x7f52d153fe03] > > > >>[ 2] /home/toueg/openmpi/lib/libmpi.so.0(PMPI_Recv+0x2d2) > [0x7f52d3504a1e] > > > >>[ 3] /home/toueg/openmpi/lib/libmpi_f77.so.0(pmpi_recv_+0x10e) > [0x7f52d36cf9c6] > > > >> > > > >>How can I make this error more explicit ? > > > >> > > > >>I use the following configuration of openmpi-1.4.3 : > > > >>./configure --enable-debug --prefix=/home/toueg/openmpi CXX=g++ > CC=gcc F77=gfortran FC=gfortran FLAGS="-m64 -fdefault-integer-8 > -fdefault-real-8 -fdefault-double-8" FCFLAGS="-m64 -fdefault-integer-8 > -fdefault-real-8 -fdefault-double-8" --disable-mpi-f90 > > > >> > > > >>Here is the output of mpif77 -v : > > > >>mpif77 for 1.2.7 (release) of : 2005/11/04 11:54:51 > > > >>Driving: f77 -L/usr/lib/mpich-mpd/lib -v -lmpich-p4mpd -lpthread -lrt > -lfrtbegin -lg2c -lm -shared-libgcc > > > >>Lecture des spécification à partir de > /usr/lib/gcc/x86_64-linux-gnu/3.4.6/specs > > > >>Configuré avec: ../src/configure -v > --enable-languages=c,c++,f77,pascal --prefix=/usr --libexecdir=/usr/lib > --with-gxx-include-dir=/usr/include/c++/3.4 --enable-shared > --with-system-zlib --enable-nls --without-included-gettext > --program-suffix=-3.4 --enable-__cxa_atexit --enable-clocale=gnu > --enable-libstdcxx-debug x86_64-linux-gnu > > > >>Modèle de thread: posix > > > >>version gcc 3.4.6 (Debian 3.4.6-5) > > > >> /usr/lib/gcc/x86_64-linux-gnu/3.4.6/collect2 --eh-frame-hdr -m > elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 > /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crt1.o > /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crti.o > /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtbegin.o -L/usr/lib/mpich-mpd/lib > -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6 > -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib > -L/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../.. -L/lib/../lib > -L/usr/lib/../lib -lmpich-p4mpd -lpthread -lrt -lfrtbegin -lg2c -lm -lgcc_s > -lgcc -lc -lgcc_s -lgcc /usr/lib/gcc/x86_64-linux-gnu/3.4.6/crtend.o > /usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/crtn.o > > > > >>/usr/lib/gcc/x86_64-linux-gnu/3.4.6/../../../../lib/libfrtbegin.a(frtbegin.o): > dans la fonction ▒ main ▒: > > > >>(.text+0x1e): référence indéfinie vers ▒ MAIN__ ▒ > > > >>collect2: ld a retourné 1 code d'état d'exécution > > > >> > > > >>Thanks, > > > >>Benjamin > > > >> > > > >>_______________________________________________ > > > >>users mailing list > > > >>us...@open-mpi.org > > > >>http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >