Ok it looks like a bigger problem. The segfault is not related to OMPI because when I go and rebuild 1.2 or another version we use with IB all the time, it will now fail with a segfault when forcing IB. The old libs of the same version still work. They of-course do not have the flag to turn off early completion.

Was there an older version of OpenMPI that did not suffer from the early completion problem? We have many installed and for a quick test latest and greatest would not be of much concern while we track down the problem on our end.

We are on RHEL4 using OFED provided by redhat. The error is "address not mapped to object"

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



On Jul 3, 2008, at 8:38 AM, Jeff Squyres wrote:
On Jul 2, 2008, at 11:51 PM, Pavel Shamis (Pasha) wrote:

In trying to build 1.2.6 with the pgi compilers it makes an MPI library that works with tcp, sm. But it segfaults on openib.

Both our intel compiler version and pgi version of 1.2.6 blow up like this when we force IB. So this is a new issue.
I have ompi 1.2.6 installed on my machines with Intel compiler (version 10.1) and Pgi compiler (version 7.1-5), both of them works with IB without any problem. BTW Mellanox provides Mellanox OFED binary distribution that include Intel and Pgi Open MPI 1.2.6 build. You can download it from here http://www.mellanox.com/products/ ofed.php


Is there a way to shut off early completion in 1.2.3?
Sure, just add "--mca |pml_ob1_use_early_completion 0" to your command line.| ||

Note that this flag was not added until v1.2.6; it has no effect in v1.2.3.

Or the the above a known issues and i should use 1.2.7-pre or grab a 1.3 snap shot?
1.2.6 should be ok.


The upcoming v1.3 series works a little differently; there's no need to use this flag in the v1.3 series (i.e., this flag only exists in the v1.2 series starting with v1.2.6).

--
Jeff Squyres
Cisco Systems




Reply via email to