I'm *very loosely* checking email. :-) Agree with what Ralph said: it looks like your program called memalign, and that ended up segv'ing. That could be an OMPI problem, or it could be an application problem. Try also configuring OMPI --with-valgrind and running your app through a memory-checking debugger (although OMPI is not very valgrind-clean in the 1.6 series :-\ -- you'll get a bunch of false positives about reads from unallocated and memory being left still-allocated after MPI_FINALIZE).
On Dec 23, 2013, at 7:17 PM, Ralph Castain <r...@open-mpi.org> wrote: > I fear that Jeff and Brian are both out for the holiday, Gus, so we are > unlikely to have much info on this until they return > > I'm unaware of any such problems in 1.6.5. It looks like something isn't > properly aligned in memory - could be an error on our part, but might be in > the program. You might want to build a debug version and see if that > segfaults, and then look at the core with gdb to see where it happened. > > > On Dec 23, 2013, at 3:27 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > >> Dear OMPI experts >> >> I have been using OMPI 1.6.5 built with gcc 4.4.7 and >> PGI pgfortran 11.10 to successfully compile and run >> a large climate modeling program (CESM) in several >> different configurations. >> >> However, today I hit a segmentation fault when running a new model >> configuration. >> [In the climate modeling jargon, a program is called a "model".] >> >> This is somewhat unpleasant because that OMPI build >> is a central piece of the production CESM model setup available >> to all users in our two clusters at this point. >> I have other OMPI 1.6.5 builds, with other compilers, but that one >> was working very well with CESM, until today. >> >> Unless I am misinterpreting it, the error message, >> reproduced below, seems to indicate the problem >> happened inside the OMPI library. >> Or not? >> >> Other details: >> >> Nodes are AMD Opteron 6376 x86_64, interconnect is Infiniband QDR, >> OS is stock CentOS 6.4, kernel 2.6.32-358.2.1.el6.x86_64. >> The program is compiled with the OMPI wrappers (mpicc and mpif90), >> and somewhat conservative optimization flags: >> >> FFLAGS := $(CPPDEFS) -i4 -gopt -Mlist -Mextend -byteswapio >> -Minform=inform -traceback -O2 -Mvect=nosse -Kieee >> >> Is this a known issue? >> Any clues on how to address it? >> >> Thank you for your help, >> Gus Correa >> >> **************** error message ******************* >> >> [1,31]<stderr>:[node30:17008] *** Process received signal *** >> [1,31]<stderr>:[node30:17008] Signal: Segmentation fault (11) >> [1,31]<stderr>:[node30:17008] Signal code: Address not mapped (1) >> [1,31]<stderr>:[node30:17008] Failing at address: 0x17 >> [1,31]<stderr>:[node30:17008] [ 0] /lib64/libpthread.so.0(+0xf500) >> [0x2b788ef9f500] >> [1,31]<stderr>:[node30:17008] [ 1] >> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(+0x100ee3) >> [0x2b788e200ee3] >> [1,31]<stderr>:[node30:17008] [ 2] >> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x111) >> [0x2b788e203771] >> [1,31]<stderr>:[node30:17008] [ 3] >> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x97) >> [0x2b788e2046d7] >> [1,31]<stderr>:[node30:17008] [ 4] >> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x8b) >> [0x2b788e2052ab] >> [1,31]<stderr>:[node30:17008] [ 5] ./ccsm.exe(pgf90_auto_alloc+0x73) >> [0xe2c4c3] >> [1,31]<stderr>:[node30:17008] *** End of error message *** >> -------------------------------------------------------------------------- >> mpiexec noticed that process rank 31 with PID 17008 on node node30 exited on >> signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/