> On Nov 17, 2016, at 3:22 PM, Noam Bernstein <noam.bernst...@nrl.navy.mil> 
> wrote:
> 
> Hi - we’ve started seeing over the last few days crashes and hangs in 
> openmpi, in a code that hasn’t been touched in months, and an openmpi 
> installation (v. 1.8.5) that also hasn’t been touched in months.  The 
> symptoms are either a hang, with a stack trace (from attaching to the one 
> running process that’s got 0% CPU usage) that looks like this:
> .
> .
> .
> .
> I’m in the process of recompiling openmpi 1.8.8 and the mpi-using code (vasp 
> 5.4.1), just to make sure everything’s clean, but I was just wondering if 
> anyone had any ideas as to what might even be causing this kind of behavior, 
> or what other information might be useful for me to gather to figure out 
> what’s going on.  As I implied at the top, this setup’s been working well for 
> years, and I believe entirely untouched (the openmpi library and executable, 
> I mean, since we did just have a kernel update) for far longer than these 
> crashes.
>       


No one has any suggestions about this problem?  I tried openmpi 1.8.8, and a 
newer version of Mellanox’s OFED, and behavior is the same.  

Does anyone who knows the guts of mpi have any ideas whether this even looks 
like an openmpi problem (as opposed to lower level, i.e. infiniband drivers, or 
higher level, i.e. calling code), from the stack traces I posted earlier?

                                                                                
                Noam

____________
||
|U.S. NAVAL|
|_RESEARCH_|
LABORATORY
Noam Bernstein, Ph.D.
Center for Materials Physics and Technology
U.S. Naval Research Laboratory
T +1 202 404 8628  F +1 202 404 7546
https://www.nrl.navy.mil <https://www.nrl.navy.mil/>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to