Thanks to all who replied. First, I'm running openmpi 1.4.2.
Second coredumpsize is unlimited, and indeed I DO get core dumps when I'm running a single-processor version. Third, the problem isn't stopping the program, MPI_Abort does that just fine, rather it's getting a cordump. According to the man page, MPI_Abort sends a SIGTERM, not a SIGABRT so perhaps that's what should happen. Finally, my guess as to what's happening if I use the libc abort is that the other nodes get stuck in an MPI call (I do lots of MPI_Reduces or MPI_Bcasts in this code), but this doesn't explain why the node calling abort doesn't exit with a coredump. David On Thu, 2010-08-12 at 20:44 -0600, Ralph Castain wrote: > Sounds very strange - what OMPI version, on what type of machine, and how was > it configured? > > > On Aug 12, 2010, at 7:49 PM, David Ronis wrote: > > > I've got a mpi program that is supposed to to generate a core file if > > problems arise on any of the nodes. I tried to do this by adding a > > call to abort() to my exit routines but this doesn't work; I get no core > > file, and worse, mpirun doesn't detect that one of my nodes has > > aborted(?) and doesn't kill off the entire job, except in the trivial > > case where the number of processors I'm running on is 1. I've replaced > > abort with MPI_Abort, which kills everything off, but leaves no core > > file. Any suggestions how I can get one and still have mpi exit? > > > > Thanks in advance. > > > > David > > > > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users >