Ok it looks like a bigger problem. The segfault is not related to
OMPI because when I go and rebuild 1.2 or another version we use with
IB all the time, it will now fail with a segfault when forcing IB.
The old libs of the same version still work. They of-course do not
have the flag to turn off early completion.
Was there an older version of OpenMPI that did not suffer from the
early completion problem? We have many installed and for a quick test
latest and greatest would not be of much concern while we track down
the problem on our end.
We are on RHEL4 using OFED provided by redhat. The error is
"address not mapped to object"
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jul 3, 2008, at 8:38 AM, Jeff Squyres wrote:
On Jul 2, 2008, at 11:51 PM, Pavel Shamis (Pasha) wrote:
In trying to build 1.2.6 with the pgi compilers it makes an MPI
library that works with tcp, sm. But it segfaults on openib.
Both our intel compiler version and pgi version of 1.2.6 blow up
like this when we force IB. So this is a new issue.
I have ompi 1.2.6 installed on my machines with Intel compiler
(version 10.1) and Pgi compiler (version 7.1-5), both of them works
with IB without any problem. BTW Mellanox provides Mellanox OFED
binary distribution that include Intel and Pgi Open MPI 1.2.6 build.
You can download it from here http://www.mellanox.com/products/
ofed.php
Is there a way to shut off early completion in 1.2.3?
Sure, just add "--mca |pml_ob1_use_early_completion 0" to your
command line.| ||
Note that this flag was not added until v1.2.6; it has no effect in
v1.2.3.
Or the the above a known issues and i should use 1.2.7-pre or
grab a 1.3 snap shot?
1.2.6 should be ok.
The upcoming v1.3 series works a little differently; there's no
need to use this flag in the v1.3 series (i.e., this flag only
exists in the v1.2 series starting with v1.2.6).
--
Jeff Squyres
Cisco Systems