On May 5, 2010, at 7:54 PM, Douglas Guptill wrote:

> P.S.  Yes, I know OpenMPI 1.2.8 is old.  We have been using it for 2
> years with no apparent problems.  

It ain't broke; don't fix it -- nothing wrong with that.

> When I saw comments like "machine hung" for 1.4.1,

FWIW, I find it hard to believe that Open MPI is the cause of machine hangs.  
Open MPI is user-level process stuff, which should generally not be able to 
crash Linux.  If user-level processes can hang Linux, then something else is 
probably broken.  

But also FWIW, we have found various MPI benchmarks and test applications can 
be *excellent* at finding underlying server / network problems.  I can't think 
of a case offhand where Open MPI "caused" a machine to hang/crash/die/whatever 
that wasn't ultimately tracked down to some other root cause.  

> and "data loss" for 1.3.x, I put aside thoughts of upgrading.

We definitely did have a big problem with OpenFabrics registered memory in Open 
MPI 1.3.0 and 1.3.1 (corrected in 1.3.2).  Shame on us.  :-(  

But to continue the "FWIW" from above: we actually do *millions* of regression 
tests before Open MPI is released -- literally.  All of us were convinced that 
1.3.0 and 1.3.1 were ok to release; the data corruption issues caught us by 
surprise.  Yuck.  Those kinds of bugs are the worst.  :-(

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to