On 1/27/2012 5:24 AM, Jeff Squyres wrote:
On Jan 27, 2012, at 12:45 AM, Paul H. Hargrove wrote:

On this cluster, statfs() is returning ENOENT, which is breaking 
opal_path_nfs().
So, these results are with test/opal/util/opal_path_nfs.c "disabled".
Paul -- can you explain this a little more?  There should be logic in there to 
effectively handle ENOENT's, meaning that if we get a non-ESTALE error, we try again with 
the directory name.  This is repeated until we get to "/" -- so there should 
definitely be at least one case where statfs() is *not* returning ENOENT.

Is that not happening?


I looked a bit deeper and found that the bug is in OMPI, but a simple one to fix.
I added 2 lines to opal/util/path.c:

--- openmpi-1.4.5rc2-orig/opal/util/path.c 2011-02-04 07:38:16.000000000 -0600 +++ openmpi-1.4.5rc2/opal/util/path.c 2012-01-27 12:46:30.000000000 -0600
@@ -476,6 +476,8 @@
         rc = statvfs (file, &buf);
#elif defined(linux) || defined (__BSD) || (defined(__APPLE__) && defined(__MACH__))
         rc = statfs (file, &buf);
+#else
+  #error "No statvfs or statfs call"
 #endif
     } while (-1 == rc && ESTALE == errno && (0 < --trials));


Can you guess what happens when I "make" now?
There IS no call to statfs, and the ENOENT I saw must have been "left over" from some earlier libc call.

The problem is that these compilers have not pre-defined "linux".
It does appear that they are defining "__linux" and "__linux__" (double-underscores).
So, a little change of the preprocessor logic should fix this problem:
$ sed -pi -e 's/defined\(linux\)/defined\(__linux__\)/;' -- opal/util/path.c
[more compact than the corresponding diffs]

With that change (and without "disabling" opal_path_nfs.c) all 4 compilers are PASSing "make all install check".

Source inspection suggests that the 1.5 branch has the same issue.
I've not inspected the HEAD, but somebody should.


FYI:
I've done a bit of grepping for linux,__linux,__linux__.
My search shows only 2 files checking for definition of "linux"
   opal/util/path.c
   opal/mca/memory/ptmalloc2/malloc.c
And exactly one looking for "__linux":
   test/event/event-test.c
Checks for "__linux__" appear in the following files:
   ompi/mca/io/romio/romio/adio/ad_lustre/ad_lustre.h
   ompi/mca/btl/openib/btl_openib_component.c
   opal/util/if.c
   opal/mca/memory/ptmalloc2/arena.c
   test/util/opal_path_nfs.c (IRONY!)
I suggest standardization to "__linux__" in the 3 files that currently use "linux" or "__linux".


-Paul

--
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to