I'm *very loosely* checking email.  :-)

Agree with what Ralph said: it looks like your program called memalign, and 
that ended up segv'ing.  That could be an OMPI problem, or it could be an 
application problem.  Try also configuring OMPI --with-valgrind and running 
your app through a memory-checking debugger (although OMPI is not very 
valgrind-clean in the 1.6 series :-\ -- you'll get a bunch of false positives 
about reads from unallocated and memory being left still-allocated after 
MPI_FINALIZE).



On Dec 23, 2013, at 7:17 PM, Ralph Castain <r...@open-mpi.org> wrote:

> I fear that Jeff and Brian are both out for the holiday, Gus, so we are 
> unlikely to have much info on this until they return
> 
> I'm unaware of any such problems in 1.6.5. It looks like something isn't 
> properly aligned in memory - could be an error on our part, but might be in 
> the program. You might want to build a debug version and see if that 
> segfaults, and then look at the core with gdb to see where it happened.
> 
> 
> On Dec 23, 2013, at 3:27 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> 
>> Dear OMPI experts
>> 
>> I have been using OMPI 1.6.5 built with gcc 4.4.7 and
>> PGI pgfortran 11.10 to successfully compile and run
>> a large climate modeling program (CESM) in several
>> different configurations.
>> 
>> However, today I hit a segmentation fault when running a new model 
>> configuration.
>> [In the climate modeling jargon, a program is called a "model".]
>> 
>> This is somewhat unpleasant because that OMPI build
>> is a central piece of the production CESM model setup available
>> to all users in our two clusters at this point.
>> I have other OMPI 1.6.5 builds, with other compilers, but that one
>> was working very well with CESM, until today.
>> 
>> Unless I am misinterpreting it, the error message,
>> reproduced below, seems to indicate the problem
>> happened inside the OMPI library.
>> Or not?
>> 
>> Other details:
>> 
>> Nodes are AMD Opteron 6376 x86_64, interconnect is Infiniband QDR,
>> OS is stock CentOS 6.4, kernel 2.6.32-358.2.1.el6.x86_64.
>> The program is compiled with the OMPI wrappers (mpicc and mpif90),
>> and somewhat conservative optimization flags:
>> 
>> FFLAGS       := $(CPPDEFS) -i4 -gopt -Mlist -Mextend -byteswapio 
>> -Minform=inform -traceback -O2 -Mvect=nosse -Kieee
>> 
>> Is this a known issue?
>> Any clues on how to address it?
>> 
>> Thank you for your help,
>> Gus Correa
>> 
>> **************** error message *******************
>> 
>> [1,31]<stderr>:[node30:17008] *** Process received signal ***
>> [1,31]<stderr>:[node30:17008] Signal: Segmentation fault (11)
>> [1,31]<stderr>:[node30:17008] Signal code: Address not mapped (1)
>> [1,31]<stderr>:[node30:17008] Failing at address: 0x17
>> [1,31]<stderr>:[node30:17008] [ 0] /lib64/libpthread.so.0(+0xf500) 
>> [0x2b788ef9f500]
>> [1,31]<stderr>:[node30:17008] [ 1] 
>> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(+0x100ee3) 
>> [0x2b788e200ee3]
>> [1,31]<stderr>:[node30:17008] [ 2] 
>> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_malloc+0x111)
>>  [0x2b788e203771]
>> [1,31]<stderr>:[node30:17008] [ 3] 
>> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_int_memalign+0x97)
>>  [0x2b788e2046d7]
>> [1,31]<stderr>:[node30:17008] [ 4] 
>> /sw/openmpi/1.6.5/gnu-4.4.7-pgi-11.10/lib/libmpi.so.1(opal_memory_ptmalloc2_memalign+0x8b)
>>  [0x2b788e2052ab]
>> [1,31]<stderr>:[node30:17008] [ 5] ./ccsm.exe(pgf90_auto_alloc+0x73) 
>> [0xe2c4c3]
>> [1,31]<stderr>:[node30:17008] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpiexec noticed that process rank 31 with PID 17008 on node node30 exited on 
>> signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to