Thanks to all who replied.  

First, I'm running openmpi 1.4.2.  

Second coredumpsize is unlimited, and indeed I DO get core dumps when
I'm running a single-processor version.  Third, the problem isn't
stopping the program, MPI_Abort does that just fine, rather it's getting
a cordump.  According to the man page, MPI_Abort sends a SIGTERM, not a
SIGABRT so perhaps that's what should happen.   

Finally, my guess as to what's happening if I use the libc abort is that
the other nodes get stuck in an MPI call (I do lots of MPI_Reduces or
MPI_Bcasts in this code), but this doesn't explain why the node calling
abort doesn't exit with a coredump.

David

On Thu, 2010-08-12 at 20:44 -0600, Ralph Castain wrote:
> Sounds very strange - what OMPI version, on what type of machine, and how was 
> it configured?
> 
> 
> On Aug 12, 2010, at 7:49 PM, David Ronis wrote:
> 
> > I've got a mpi program that is supposed to to generate a core file if
> > problems arise on any of the nodes.   I tried to do this by adding a
> > call to abort() to my exit routines but this doesn't work; I get no core
> > file, and worse, mpirun doesn't detect that one of my nodes has
> > aborted(?) and doesn't kill off the entire job, except in the trivial
> > case where the number of processors I'm running on is 1.   I've replaced
> > abort with MPI_Abort, which kills everything off, but leaves no core
> > file.  Any suggestions how I can get one and still have mpi exit?
> > 
> > Thanks in advance.
> > 
> > David
> > 
> > 
> > 
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Reply via email to