On 1/27/2012 5:24 AM, Jeff Squyres wrote:
On Jan 27, 2012, at 12:45 AM, Paul H. Hargrove wrote:
On this cluster, statfs() is returning ENOENT, which is breaking
opal_path_nfs().
So, these results are with test/opal/util/opal_path_nfs.c "disabled".
Paul -- can you explain this a little more? There should be logic in there to
effectively handle ENOENT's, meaning that if we get a non-ESTALE error, we try again with
the directory name. This is repeated until we get to "/" -- so there should
definitely be at least one case where statfs() is *not* returning ENOENT.
Is that not happening?
I looked a bit deeper and found that the bug is in OMPI, but a simple
one to fix.
I added 2 lines to opal/util/path.c:
--- openmpi-1.4.5rc2-orig/opal/util/path.c 2011-02-04
07:38:16.000000000 -0600
+++ openmpi-1.4.5rc2/opal/util/path.c 2012-01-27 12:46:30.000000000
-0600
@@ -476,6 +476,8 @@
rc = statvfs (file, &buf);
#elif defined(linux) || defined (__BSD) || (defined(__APPLE__) &&
defined(__MACH__))
rc = statfs (file, &buf);
+#else
+ #error "No statvfs or statfs call"
#endif
} while (-1 == rc && ESTALE == errno && (0 < --trials));
Can you guess what happens when I "make" now?
There IS no call to statfs, and the ENOENT I saw must have been "left
over" from some earlier libc call.
The problem is that these compilers have not pre-defined "linux".
It does appear that they are defining "__linux" and "__linux__"
(double-underscores).
So, a little change of the preprocessor logic should fix this problem:
$ sed -pi -e 's/defined\(linux\)/defined\(__linux__\)/;' --
opal/util/path.c
[more compact than the corresponding diffs]
With that change (and without "disabling" opal_path_nfs.c) all 4
compilers are PASSing "make all install check".
Source inspection suggests that the 1.5 branch has the same issue.
I've not inspected the HEAD, but somebody should.
FYI:
I've done a bit of grepping for linux,__linux,__linux__.
My search shows only 2 files checking for definition of "linux"
opal/util/path.c
opal/mca/memory/ptmalloc2/malloc.c
And exactly one looking for "__linux":
test/event/event-test.c
Checks for "__linux__" appear in the following files:
ompi/mca/io/romio/romio/adio/ad_lustre/ad_lustre.h
ompi/mca/btl/openib/btl_openib_component.c
opal/util/if.c
opal/mca/memory/ptmalloc2/arena.c
test/util/opal_path_nfs.c (IRONY!)
I suggest standardization to "__linux__" in the 3 files that currently
use "linux" or "__linux".
-Paul
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900