Dear all - thanks to everyone for all the hints. Problem has been solved.
To summarise for the benefit of anyone else who may encounter the issue of
MPI hanging when trying to connect through the network:


   1. Followed Rolf's advice and also hints from
   http://www.open-mpi.org/faq/?category=tcp) and ran MPI with the option
   btl_base_verbose.
   2. This revealed that MPI was trying to connect through the wrong IP
   addresses for some nodes:
   http://www.aifdr.org/projects/system_administration/ticket/17#comment:4
   3. This in turn, was due to our cluster was using different eth* to mac
   address mappings leading DHCP to issues the wrong addresses:
   http://www.aifdr.org/projects/system_administration/ticket/9#comment:10 -
   Interestingly, we had no problem with ping and ssh, so maybe MPI defaults to
   using eth0?
   4. After much fiddling with udev/rules.d we gave up on that and just
   assigned static addresses to eth0 (I am now convinced udev/rules.d doesn't
   work :-))

After that everything works beautifully.
The morale is KISS - Keep everything as simple as you can.

Onto some more earthquake and tsunami modelling.

Cheers
Ole



On Tue, Sep 20, 2011 at 9:44 PM, <users-requ...@open-mpi.org> wrote:

> Send users mailing list submissions to
>        us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
>        users-requ...@open-mpi.org
>
> You can reach the person managing the list at
>        users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>   1. Re: RE :  MPI hangs on multiple nodes (Gus Correa)
>   2. Typo in MPI_Cart_coords man page (Jeremiah Willcock)
>   3. Re: RE :  MPI hangs on multiple nodes (Gus Correa)
>   4. How could OpenMPI (or MVAPICH) affect floating-point      results?
>      (Blosch, Edwin L)
>   5. MPI hangs on multiple nodes (Ole Nielsen)
>   6. MPI hangs on multiple nodes (Ole Nielsen)
>   7. Re: How could OpenMPI (or MVAPICH) affect floating-point
>      results? (Reuti)
>   8. Re: How could OpenMPI (or MVAPICH) affect floating-point
>      results? (Tim Prince)
>   9. Re: How could OpenMPI (or MVAPICH) affect floating-point
>      results? (Jeff Squyres)
>  10. Re: MPI hangs on multiple nodes (Jeff Squyres)
>  11. Re: Latency of 250 microseconds with Open-MPI 1.4.3, Mellanox
>      Infiniband and 256 MPI ranks (Yevgeny Kliteynik)
>  12. Re: How could OpenMPI (or MVAPICH) affect floating-point
>      results? (Reuti)
>  13. Re: MPI hangs on multiple nodes (Rolf vandeVaart)
>  14. Re: Open MPI and Objective C (Barrett, Brian W)
>  15. Re: How could OpenMPI (or MVAPICH) affect floating-point
>      results? (Samuel K. Gutierrez)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 19 Sep 2011 13:13:08 -0400
> From: Gus Correa <g...@ldeo.columbia.edu>
> Subject: Re: [OMPI users] RE :  MPI hangs on multiple nodes
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <4e777824.60...@ldeo.columbia.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Eugene
>
> You're right, it is blocking send, buffers can be reused after MPI_Send
> returns.
> My bad, I only read your answer to Sebastien and Ole
> after I posted mine.
>
> Could MPI run out of [internal] buffers to hold the messages, perhaps?
> The messages aren't that big anyway [5000 doubles].
> Could MPI behave differently regarding internal
> buffering when communication is intra-node vs. across the network?
> [It works intra-node, according to Ole's posting.]
>
> I suppose Ole rebuilt OpenMPI on his newly installed Ubuntu.
>
> Gus Correa
>
>
> Eugene Loh wrote:
> > I'm missing the point on the buffer re-use.  It seems to me that the
> > sample program passes some buffer around in a ring.  Each process
> > receives the buffer with a blocking receive and then forwards it with a
> > blocking send.  The blocking send does not return until the send buffer
> > is safe to reuse.
> >
> > On 9/19/2011 7:37 AM, Gus Correa wrote:
> >> You could try the examples/connectivity.c program in the
> >> OpenMPI source tree, to test if everything is alright.
> >> It also hints how to solve the buffer re-use issue
> >> that Sebastien [rightfully] pointed out [i.e., declare separate
> >> buffers for MPI_Send and MPI_Recv].
> >>
> >> S?bastien Boisvert wrote:
> >>> Is it safe to re-use the same buffer (variable A) for MPI_Send and
> >>> MPI_Recv given that MPI_Send may be eager depending on
> >>> the MCA parameters ?
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 19 Sep 2011 15:14:42 -0400 (EDT)
> From: Jeremiah Willcock <jewil...@osl.iu.edu>
> Subject: [OMPI users] Typo in MPI_Cart_coords man page
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <alpine.lrh.2.00.1109191513310.14...@flowerpot.osl.iu.edu>
> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII
>
> The bottom of the MPI_Cart_coords man page (in SVN trunk as well as some
> releases) states:
>
> The inverse mapping, rank-to-coordinates translation is provided by
> MPI_Cart_coords.
>
> Although that is true, we are already in the man page for MPI_Cart_coords
> and so the reverse is the mapping from coordinates to rank.
>
> -- Jeremiah Willcock
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 19 Sep 2011 16:19:40 -0400
> From: Gus Correa <g...@ldeo.columbia.edu>
> Subject: Re: [OMPI users] RE :  MPI hangs on multiple nodes
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <4e77a3dc.80...@ldeo.columbia.edu>
> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>
> Hi Ole, Eugene
>
> For what it is worth, I tried Ole's program here,
> as Devendra Rai had done before.
> I ran it across two nodes, with a total of 16 processes.
> I tried mca parameters for openib Infiniband,
> then for tcp on Gigabit Ethernet.
> Both work.
> I am using OpenMPI 1.4.3 compiled with GCC 4.1.2 on CentOS 5.2.
> Thanks.
>
> Gus Correa
>
> Gus Correa wrote:
> > Hi Eugene
> >
> > You're right, it is blocking send, buffers can be reused after MPI_Send
> > returns.
> > My bad, I only read your answer to Sebastien and Ole
> > after I posted mine.
> >
> > Could MPI run out of [internal] buffers to hold the messages, perhaps?
> > The messages aren't that big anyway [5000 doubles].
> > Could MPI behave differently regarding internal
> > buffering when communication is intra-node vs. across the network?
> > [It works intra-node, according to Ole's posting.]
> >
> > I suppose Ole rebuilt OpenMPI on his newly installed Ubuntu.
> >
> > Gus Correa
> >
> >
> > Eugene Loh wrote:
> >> I'm missing the point on the buffer re-use.  It seems to me that the
> >> sample program passes some buffer around in a ring.  Each process
> >> receives the buffer with a blocking receive and then forwards it with
> >> a blocking send.  The blocking send does not return until the send
> >> buffer is safe to reuse.
> >>
> >> On 9/19/2011 7:37 AM, Gus Correa wrote:
> >>> You could try the examples/connectivity.c program in the
> >>> OpenMPI source tree, to test if everything is alright.
> >>> It also hints how to solve the buffer re-use issue
> >>> that Sebastien [rightfully] pointed out [i.e., declare separate
> >>> buffers for MPI_Send and MPI_Recv].
> >>>
> >>> S?bastien Boisvert wrote:
> >>>> Is it safe to re-use the same buffer (variable A) for MPI_Send and
> >>>> MPI_Recv given that MPI_Send may be eager depending on
> >>>> the MCA parameters ?
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 19 Sep 2011 16:41:08 -0600
> From: "Blosch, Edwin L" <edwin.l.blo...@lmco.com>
> Subject: [OMPI users] How could OpenMPI (or MVAPICH) affect
>        floating-point  results?
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID:
>        <e9f276a0010af44abd2c03ed2edae7db275faad...@hdxmspb.us.lmco.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I am observing differences in floating-point results from an application
> program that appear to be related to whether I link with OpenMPI 1.4.3 or
> MVAPICH 1.2.0.  Both packages were built with the same installation of Intel
> 11.1, as well as the application program; identical flags passed to the
> compiler in each case.
>
> I've tracked down some differences in a compute-only routine where I've
> printed out the inputs to the routine (to 18 digits) ; the inputs are
> identical.  The output numbers are different in the 16th place (perhaps a
> few in the 15th place).  These differences only show up for optimized code,
> not for -O0.
>
> My assumption is that some optimized math intrinsic is being replaced
> dynamically, but I do not know how to confirm this.  Anyone have guidance to
> offer? Or similar experience?
>
> Thanks very much
>
> Ed
>
> Just for what it's worth, here's the output of ldd:
>
> % ldd application_mvapich
>        linux-vdso.so.1 =>  (0x00007fffe3746000)
>        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b5b45fc1000)
>        libmpich.so.1.0 =>
> /usr/mpi/intel/mvapich-1.2.0/lib/shared/libmpich.so.1.0 (0x00002b5b462cd000)
>        libibverbs.so.1 => /usr/lib64/libibverbs.so.1 (0x00002b5b465ed000)
>        libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00002b5b467fc000)
>        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b5b46a04000)
>        librt.so.1 => /lib64/librt.so.1 (0x00002b5b46c21000)
>        libm.so.6 => /lib64/libm.so.6 (0x00002b5b46e2a000)
>        libdl.so.2 => /lib64/libdl.so.2 (0x00002b5b47081000)
>        libc.so.6 => /lib64/libc.so.6 (0x00002b5b47285000)
>        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b5b475e3000)
>        /lib64/ld-linux-x86-64.so.2 (0x00002b5b45da0000)
>        libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so
> (0x00002b5b477fb000)
>        libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so
> (0x00002b5b47b8f000)
>        libintlc.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b5b47da5000)
>
> % ldd application_openmpi
>       linux-vdso.so.1 =>  (0x00007fff6ebff000)
>        libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b6e7c17d000)
>        libmpi_f90.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi_f90.so.0 (0x00002b6e7c489000)
>        libmpi_f77.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi_f77.so.0 (0x00002b6e7c68d000)
>        libmpi.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi.so.0
> (0x00002b6e7c8ca000)
>        libopen-rte.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-rte.so.0 (0x00002b6e7cb9c000)
>        libopen-pal.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-pal.so.0 (0x00002b6e7ce01000)
>        libdl.so.2 => /lib64/libdl.so.2 (0x00002b6e7d077000)
>        libnsl.so.1 => /lib64/libnsl.so.1 (0x00002b6e7d27c000)
>        libutil.so.1 => /lib64/libutil.so.1 (0x00002b6e7d494000)
>        libm.so.6 => /lib64/libm.so.6 (0x00002b6e7d697000)
>        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b6e7d8ee000)
>        libc.so.6 => /lib64/libc.so.6 (0x00002b6e7db0b000)
>        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b6e7de69000)
>        /lib64/ld-linux-x86-64.so.2 (0x00002b6e7bf5c000)
>        libifport.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5 (0x00002b6e7e081000)
>        libifcoremt.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5
> (0x00002b6e7e1ba000)
>        libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so
> (0x00002b6e7e45f000)
>        libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so
> (0x00002b6e7e7f4000)
>        libintlc.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b6e7ea0a000)
>
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> Message: 5
> Date: Tue, 20 Sep 2011 07:48:04 +0700
> From: Ole Nielsen <ole.moller.niel...@gmail.com>
> Subject: [OMPI users] MPI hangs on multiple nodes
> To: us...@open-mpi.org
> Message-ID:
>        <calclsfqsywzr_ygmtpugwcx7abjkntcxfqrp2qux--s5tdq...@mail.gmail.com
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Thanks for your suggestion Gus, we need a way of debugging what is going
> on.
> I am pretty sure the problem lies with our cluster configuration. I know
> MPI
> simply relies on the underlying network. However, we can ping and ssh to
> all
> nodes (and in between and pair as well) so it is currently a mystery why
> MPI
> doesn't communicate across nodes on our cluster.
> Two further questions for the group
>
>   1. I would love to run the test program connectivity.c, but cannot find
>   it anywhere. Can anyone help please?
>   2. After having left the job hanging over night we got the message
>
> [node5][[9454,1],1][../../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
>   mca_btl_tcp_frag_recv: readv failed: Connection timed out (110). Does
> anyone
>   know what this means?
>
>
> Cheers and thanks
> Ole
> PS - I don't see how separate buffers would help. Recall that the test
> program I use works fine on other installations and indeed when run on one
> the cores of one Node.
>
>
>
>
> Message: 11
> Date: Mon, 19 Sep 2011 10:37:02 -0400
> From: Gus Correa <g...@ldeo.columbia.edu>
> Subject: Re: [OMPI users] RE :  MPI hangs on multiple nodes
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <4e77538e.3070...@ldeo.columbia.edu>
> Content-Type: text/plain; charset=iso-8859-1; format=flowed
>
> Hi Ole
>
> You could try the examples/connectivity.c program in the
> OpenMPI source tree, to test if everything is alright.
> It also hints how to solve the buffer re-use issue
> that Sebastien [rightfully] pointed out [i.e., declare separate
> buffers for MPI_Send and MPI_Recv].
>
> Gus Correa
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> Message: 6
> Date: Tue, 20 Sep 2011 09:23:44 +0700
> From: Ole Nielsen <ole.moller.niel...@gmail.com>
> Subject: [OMPI users] MPI hangs on multiple nodes
> To: us...@open-mpi.org
> Message-ID:
>        <CALcLSfonKTtkp9L8XMTFg_4LRFYP2o1qXVNXykiCMC5gq=o...@mail.gmail.com
> >
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi all - and sorry for the multiple postings, but I have more information.
>
> 1: After a reboot of two nodes I ran again, and the inter-node freeze
> didn't
> happen until the third iteration. I take that to mean that the basic
> communication works, but that something is saturating. Is there some notion
> of buffer size somewhere in the MPI system that could explain this?
> 2: The nodes have 4 ethernet cards each. Could the mapping be a problem?
> 3: The cpus are running at a 100% for all processes involved in the freeze
> 4: The same test program (
> http://code.google.com/p/pypar/source/browse/source/mpi_test.c) works fine
> when run within one node so the problem must be with MPI and/or our
> network.
>
> 5: The network and ssh works otherwise fine.
>
>
> Again many thanks for any hint that can get us going again. The main thing
> we need is some diagnostics that may point to what causes this problem for
> MPI.
> Cheers
> Ole Nielsen
>
> ------
>
> Here's the output which shows the freeze in the third iteration:
>
> nielso@alamba:~/sandpit/pypar/source$ mpirun --hostfile /etc/mpihosts
> --host
> node5,node6 --npernode 2 a.out
> Number of processes = 4
> Test repeated 3 times for reliability
> I am process 2 on node node6
> P2: Waiting to receive from to P1
> P2: Sending to to P3
> I am process 3 on node node6
> P3: Waiting to receive from to P2
> I am process 1 on node node5
> P1: Waiting to receive from to P0
> P1: Sending to to P2
> P1: Waiting to receive from to P0
> I am process 0 on node node5
> Run 1 of 3
> P0: Sending to P1
> P0: Waiting to receive from P3
> P2: Waiting to receive from to P1
> P3: Sending to to P0
> P3: Waiting to receive from to P2
> P1: Sending to to P2
> P0: Received from to P3
> Run 2 of 3
> P0: Sending to P1
> P0: Waiting to receive from P3
> P1: Waiting to receive from to P0
> -------------- next part --------------
> HTML attachment scrubbed and removed
>
> ------------------------------
>
> Message: 7
> Date: Tue, 20 Sep 2011 13:25:28 +0200
> From: Reuti <re...@staff.uni-marburg.de>
> Subject: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect
>        floating-point results?
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID:
>        <4e155b3e-104f-465c-bf2b-8d145c010...@staff.uni-marburg.de>
> Content-Type: text/plain; charset=windows-1252
>
> Hi,
>
> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
>
> > I am observing differences in floating-point results from an application
> program that appear to be related to whether I link with OpenMPI 1.4.3 or
> MVAPICH 1.2.0.  Both packages were built with the same installation of Intel
> 11.1, as well as the application program; identical flags passed to the
> compiler in each case.
> >
> > I?ve tracked down some differences in a compute-only routine where I?ve
> printed out the inputs to the routine (to 18 digits) ; the inputs are
> identical.  The output numbers are different in the 16th place (perhaps a
> few in the 15th place).  These differences only show up for optimized code,
> not for ?O0.
> >
> > My assumption is that some optimized math intrinsic is being replaced
> dynamically, but I do not know how to confirm this.  Anyone have guidance to
> offer? Or similar experience?
>
> yes, I face it often but always at a magnitude where it's not of any
> concern (and not related to any MPI). Due to the limited precision in
> computers, a simple reordering of operation (although being equivalent in a
> mathematical sense) can lead to different results. Removing the anomalies
> with -O0 could proof that.
>
> The other point I heard especially for the x86 instruction set is, that the
> internal FPU has still 80 bits, while the presentation in memory is only 64
> bit. Hence when all can be done in the registers, the result can be
> different compared to the case when some interim results need to be stored
> to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it
> to stay always in 64 bit, and a similar one for Intel is -mp (now
> -fltconsistency) and -mp1.
>
> http://www.pgroup.com/doc/pgiref.pdf (page 42)
>
> http://software.intel.com/file/6335 (page 260)
>
> You could try with the mentioned switches whether you get more consistent
> output.
>
>
> If there would be a MPI ABI, and you could just drop in any MPI library, it
> would be quite easy to spot the real point where the discrepancy occured.
>
> -- Reuti
>
>
> > Thanks very much
> >
> > Ed
> >
> > Just for what it?s worth, here?s the output of ldd:
> >
> > % ldd application_mvapich
> >         linux-vdso.so.1 =>  (0x00007fffe3746000)
> >         libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b5b45fc1000)
> >         libmpich.so.1.0 =>
> /usr/mpi/intel/mvapich-1.2.0/lib/shared/libmpich.so.1.0 (0x00002b5b462cd000)
> >         libibverbs.so.1 => /usr/lib64/libibverbs.so.1
> (0x00002b5b465ed000)
> >         libibumad.so.3 => /usr/lib64/libibumad.so.3 (0x00002b5b467fc000)
> >         libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b5b46a04000)
> >         librt.so.1 => /lib64/librt.so.1 (0x00002b5b46c21000)
> >         libm.so.6 => /lib64/libm.so.6 (0x00002b5b46e2a000)
> >         libdl.so.2 => /lib64/libdl.so.2 (0x00002b5b47081000)
> >         libc.so.6 => /lib64/libc.so.6 (0x00002b5b47285000)
> >         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b5b475e3000)
> >         /lib64/ld-linux-x86-64.so.2 (0x00002b5b45da0000)
> >         libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so
> (0x00002b5b477fb000)
> >         libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so
> (0x00002b5b47b8f000)
> >         libintlc.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b5b47da5000)
> >
> > % ldd application_openmpi
> >        linux-vdso.so.1 =>  (0x00007fff6ebff000)
> >         libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00002b6e7c17d000)
> >         libmpi_f90.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi_f90.so.0 (0x00002b6e7c489000)
> >         libmpi_f77.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi_f77.so.0 (0x00002b6e7c68d000)
> >         libmpi.so.0 => /usr/mpi/intel/openmpi-1.4.3/lib64/libmpi.so.0
> (0x00002b6e7c8ca000)
> >         libopen-rte.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-rte.so.0 (0x00002b6e7cb9c000)
> >         libopen-pal.so.0 =>
> /usr/mpi/intel/openmpi-1.4.3/lib64/libopen-pal.so.0 (0x00002b6e7ce01000)
> >         libdl.so.2 => /lib64/libdl.so.2 (0x00002b6e7d077000)
> >         libnsl.so.1 => /lib64/libnsl.so.1 (0x00002b6e7d27c000)
> >         libutil.so.1 => /lib64/libutil.so.1 (0x00002b6e7d494000)
> >         libm.so.6 => /lib64/libm.so.6 (0x00002b6e7d697000)
> >         libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b6e7d8ee000)
> >         libc.so.6 => /lib64/libc.so.6 (0x00002b6e7db0b000)
> >         libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b6e7de69000)
> >         /lib64/ld-linux-x86-64.so.2 (0x00002b6e7bf5c000)
> >         libifport.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5 (0x00002b6e7e081000)
> >         libifcoremt.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5
> (0x00002b6e7e1ba000)
> >         libimf.so => /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so
> (0x00002b6e7e45f000)
> >         libsvml.so => /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so
> (0x00002b6e7e7f4000)
> >         libintlc.so.5 =>
> /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b6e7ea0a000)
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ------------------------------
>
> Message: 8
> Date: Tue, 20 Sep 2011 07:52:41 -0400
> From: Tim Prince <n...@aol.com>
> Subject: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect
>        floating-point results?
> To: us...@open-mpi.org
> Message-ID: <4e787e89.5090...@aol.com>
> Content-Type: text/plain; charset=windows-1252; format=flowed
>
> On 9/20/2011 7:25 AM, Reuti wrote:
> > Hi,
> >
> > Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
> >
> >> I am observing differences in floating-point results from an application
> program that appear to be related to whether I link with OpenMPI 1.4.3 or
> MVAPICH 1.2.0.  Both packages were built with the same installation of Intel
> 11.1, as well as the application program; identical flags passed to the
> compiler in each case.
> >>
> >> I?ve tracked down some differences in a compute-only routine where I?ve
> printed out the inputs to the routine (to 18 digits) ; the inputs are
> identical.  The output numbers are different in the 16th place (perhaps a
> few in the 15th place).  These differences only show up for optimized code,
> not for ?O0.
> >>
> >> My assumption is that some optimized math intrinsic is being replaced
> dynamically, but I do not know how to confirm this.  Anyone have guidance to
> offer? Or similar experience?
> >
> > yes, I face it often but always at a magnitude where it's not of any
> concern (and not related to any MPI). Due to the limited precision in
> computers, a simple reordering of operation (although being equivalent in a
> mathematical sense) can lead to different results. Removing the anomalies
> with -O0 could proof that.
> >
> > The other point I heard especially for the x86 instruction set is, that
> the internal FPU has still 80 bits, while the presentation in memory is only
> 64 bit. Hence when all can be done in the registers, the result can be
> different compared to the case when some interim results need to be stored
> to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it
> to stay always in 64 bit, and a similar one for Intel is -mp (now
> -fltconsistency) and -mp1.
> >
> Diagnostics below indicate that ifort 11.1 64-bit is in use.  The
> options aren't the same as Reuti's "now" version (a 32-bit compiler
> which hasn't been supported for 3 years or more?).
> With ifort 10.1 and more recent, you would set at least
> -assume protect_parens -prec-div -prec-sqrt
> if you are interested in numerical consistency.  If you don't want
> auto-vectorization of sum reductions, you would use instead
> -fp-model source -ftz
> (ftz sets underflow mode back to abrupt, while "source" sets gradual).
> It may be possible to expose 80-bit x87 by setting the ancient -mp
> option, but such a course can't be recommended without additional cautions.
>
> Quoted comment from OP seem to show a somewhat different question: Does
> OpenMPI implement any operations in a different way from MVAPICH?  I
> would think it probable that the answer could be affirmative for
> operations such as allreduce, but this leads well outside my expertise
> with respect to specific MPI implementations.  It isn't out of the
> question to suspect that such differences might be aggravated when using
> excessively aggressive ifort options such as -fast.
>
>
> >>          libifport.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5
> (0x00002b6e7e081000)
> >>          libifcoremt.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5
> (0x00002b6e7e1ba000)
> >>          libimf.so =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so (0x00002b6e7e45f000)
> >>          libsvml.so =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so (0x00002b6e7e7f4000)
> >>          libintlc.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b6e7ea0a000)
> >>
>
> --
> Tim Prince
>
>
> ------------------------------
>
> Message: 9
> Date: Tue, 20 Sep 2011 07:55:26 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect
>        floating-point results?
> To: tpri...@computer.org, Open MPI Users <us...@open-mpi.org>
> Message-ID: <911cadec-4f9b-4197-8ade-6f731b44b...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Sep 20, 2011, at 7:52 AM, Tim Prince wrote:
>
> > Quoted comment from OP seem to show a somewhat different question: Does
> OpenMPI implement any operations in a different way from MVAPICH?  I would
> think it probable that the answer could be affirmative for operations such
> as allreduce, but this leads well outside my expertise with respect to
> specific MPI implementations.  It isn't out of the question to suspect that
> such differences might be aggravated when using excessively aggressive ifort
> options such as -fast.
>
> This is 'zactly what I was going to say -- reductions between Open MPI and
> MVAPICH may well perform global arithmetic operations in different orders.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 10
> Date: Tue, 20 Sep 2011 08:11:34 -0400
> From: Jeff Squyres <jsquy...@cisco.com>
> Subject: Re: [OMPI users] MPI hangs on multiple nodes
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <dddbc7a5-a13b-459f-b4cc-984195a40...@cisco.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote:
>
> > Hi all - and sorry for the multiple postings, but I have more
> information.
>
> +1 on Eugene's comments.  The test program looks fine to me.
>
> FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler
> allows you to just:
>
>    mpicc mpi_test.c -o mpi_test -Wall
>
> > 1: After a reboot of two nodes I ran again, and the inter-node freeze
> didn't happen until the third iteration. I take that to mean that the basic
> communication works, but that something is saturating. Is there some notion
> of buffer size somewhere in the MPI system that could explain this?
>
> Hmm.  This is not a good sign; it somewhat indicates a problem with your
> OS.  Based on this email and your prior emails, I'm guessing you're using
> TCP for communication, and that the problem is based on inter-node
> communication (e.g., the problem would occur even if you only run 1 process
> per machine, but does not occur if you run all N processes on a single
> machine, per your #4, below).
>
> > 2: The nodes have 4 ethernet cards each. Could the mapping be a problem?
>
> Shouldn't be.  If it runs at all, then it should run fine.
>
> Do you have all your ethernet cards on a single subnet, or multiple
> subnets?  I have heard of problems when you have multiple ethernet cards on
> the same subnet -- I believe there's some non-determinism in than case in
> what wire/NIC a packet will actually go out, which may be problematic for
> OMPI.
>
> > 3: The cpus are running at a 100% for all processes involved in the
> freeze
>
> That's probably right.  OMPI aggressively polls for progress as a way to
> decrease latency.  So all processes are trying to make progress, and
> therefore are aggressively polling, eating up 100% of the CPU.
>
> > 4: The same test program (
> http://code.google.com/p/pypar/source/browse/source/mpi_test.c) works fine
> when run within one node so the problem must be with MPI and/or our network.
>
> This helps identify the issue as the TCP communication, not the shared
> memory communication.
>
> > 5: The network and ssh works otherwise fine.
>
> Good.
>
> > Again many thanks for any hint that can get us going again. The main
> thing we need is some diagnostics that may point to what causes this problem
> for MPI.
>
> If you are running with multiple NICs on the same subnet, change them to
> multiple subnets and see if it starts working fine.
>
> If they're on different subnets, try using the btl_tcp_if_include /
> btl_tcp_if_exclude MCA parameters to exclude certain networks and see if
> they're the problematic ones.  Keep in mind that ..._include and ..._exclude
> are mutually exclusive; you should only specify one.  And if you specify
> exclude, be sure to exclude loopback.  E.g:
>
>  mpirun --mca btl_if_include eth0,eth1 -np 16 --hostfile hostfile mpi_test
> or
>  mpirun --mca btl_if_exclude lo0,eth1 -np 16 --hostfile hostfile mpi_test
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>
>
> ------------------------------
>
> Message: 11
> Date: Tue, 20 Sep 2011 15:14:44 +0300
> From: Yevgeny Kliteynik <klit...@dev.mellanox.co.il>
> Subject: Re: [OMPI users] Latency of 250 microseconds with Open-MPI
>        1.4.3, Mellanox Infiniband and 256 MPI ranks
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <4e7883b4.7080...@dev.mellanox.co.il>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi S?bastien,
>
> If I understand you correctly, you are running your application on two
> different MPIs on two different clusters with two different IB vendors.
>
> Could you make a comparison more "apples to apples"-ish?
> For instance:
>  - run the same version of Open MPI on both clusters
>  - run the same version of MVAPICH on both clusters
>
>
> -- YK
>
> On 18-Sep-11 1:59 AM, S?bastien Boisvert wrote:
> > Hello,
> >
> > Open-MPI 1.4.3 on Mellanox Infiniband hardware gives a latency of 250
> microseconds with 256 MPI ranks on super-computer A (name is colosse).
> >
> > The same software gives a latency of 10 microseconds with MVAPICH2 and
> QLogic Infiniband hardware with 512 MPI ranks on super-computer B (name is
> guillimin).
> >
> >
> > Here are the relevant information listed in
> http://www.open-mpi.org/community/help/
> >
> >
> > 1. Check the FAQ first.
> >
> > done !
> >
> >
> > 2. The version of Open MPI that you're using.
> >
> > Open-MPI 1.4.3
> >
> >
> > 3. The config.log file from the top-level Open MPI directory, if
> available (please compress!).
> >
> > See below.
> >
> > Command file: http://pastebin.com/mW32ntSJ
> >
> >
> > 4. The output of the "ompi_info --all" command from the node where you're
> invoking mpirun.
> >
> > ompi_info -a on colosse: http://pastebin.com/RPyY9s24
> >
> >
> > 5. If running on more than one node -- especially if you're having
> problems launching Open MPI processes -- also include the output of the
> "ompi_info -v ompi full --parsable" command from each node on which you're
> trying to run.
> >
> > I am not having problems launching Open-MPI processes.
> >
> >
> > 6. A detailed description of what is failing.
> >
> > Open-MPI 1.4.3 on Mellanox Infiniband hardware give a latency of 250
> microseconds with 256 MPI ranks on super-computer A (name is colosse).
> >
> > The same software gives a latency of 10 microseconds with MVAPICH2 and
> QLogic Infiniband hardware on 512 MPI ranks on super-computer B (name is
> guillimin).
> >
> > Details follow.
> >
> >
> > I am developing a distributed genome assembler that runs with the
> message-passing interface (I am a PhD student).
> > It is called Ray. Link: http://github.com/sebhtml/ray
> >
> > I recently added the option -test-network-only so that Ray can be used to
> test the latency. Each MPI rank has to send 100000 messages (4000 bytes
> each), one by one.
> > The destination of any message is picked up at random.
> >
> >
> > On colosse, a super-computer located at Laval University, I get an
> average latency of 250 microseconds with the test done in Ray.
> >
> > See http://pastebin.com/9nyjSy5z
> >
> > On colosse, the hardware is Mellanox Infiniband QDR ConnectX and the MPI
> middleware is Open-MPI 1.4.3 compiled with gcc 4.4.2.
> >
> > colosse has 8 compute cores per node (Intel Nehalem).
> >
> >
> > Testing the latency with ibv_rc_pingpong on colosse gives 11
> microseconds.
> >
> >    local address:  LID 0x048e, QPN 0x1c005c, PSN 0xf7c66b
> >    remote address: LID 0x018c, QPN 0x2c005c, PSN 0x5428e6
> > 8192000 bytes in 0.01 seconds = 5776.64 Mbit/sec
> > 1000 iters in 0.01 seconds = 11.35 usec/iter
> >
> > So I know that the Infiniband has a correct latency between two HCAs
> because of the output of ibv_rc_pingpong.
> >
> >
> >
> > Adding the parameter --mca btl_openib_verbose 1 to mpirun shows that
> Open-MPI detects the hardware correctly:
> >
> > [r107-n57][[59764,1],0][btl_openib_ini.c:166:ompi_btl_openib_ini_query]
> Querying INI files for vendor 0x02c9, part ID 26428
> > [r107-n57][[59764,1],0][btl_openib_ini.c:185:ompi_btl_openib_ini_query]
> Found corresponding INI values: Mellanox Hermon
> >
> > see http://pastebin.com/pz03f0B3
> >
> >
> > So I don't think this is the problem described in the FAQ (
> http://www.open-mpi.org/faq/?category=openfabrics#mellanox-connectx-poor-latency)
> > and on the mailing list (
> http://www.open-mpi.org/community/lists/users/2007/10/4238.php ) because
> the INI values are found.
> >
> >
> >
> >
> > Running the network test implemented in Ray on 32 MPI ranks, I get an
> average latency of 65 microseconds.
> >
> > See http://pastebin.com/nWDmGhvM
> >
> >
> > Thus, with 256 MPI ranks I get an average latency of 250 microseconds and
> with 32 MPI ranks I get 65 microseconds.
> >
> >
> > Running the network test on 32 MPI ranks again but only allowing the MPI
> rank 0 to send messages gives a latency of 10 microseconds for this rank.
> > See http://pastebin.com/dWMXsHpa
> >
> >
> >
> > Because I get 10 microseconds in the network test in Ray when only the
> MPI rank sends messages, I would say that there may be some I/O contention.
> >
> > To test this hypothesis, I re-ran the test, but allowed only 1 MPI rank
> per node to send messages (there are 8 MPI ranks per node and a total of 32
> MPI ranks).
> > Ranks 0, 8, 16 and 24 all reported 13 microseconds. See
> http://pastebin.com/h84Fif3g
> >
> > The next test was to allow 2 MPI ranks on each node to send messages.
> Ranks 0, 1, 8, 9, 16, 17, 24, and 25 reported 15 microseconds.
> > See http://pastebin.com/REdhJXkS
> >
> > With 3 MPI ranks per node that can send messages, ranks 0, 1, 2, 8, 9,
> 10, 16, 17, 18, 24, 25, 26 reported 20 microseconds. See
> http://pastebin.com/TCd6xpuC
> >
> > Finally, with 4 MPI ranks per node that can send messages, I got 23
> microseconds. See http://pastebin.com/V8zjae7s
> >
> >
> > So the MPI ranks on a given node seem to fight for access to the HCA
> port.
> >
> > Each colosse node has 1 port (ibv_devinfo) and the max_mtu is 2048 bytes.
> See http://pastebin.com/VXMAZdeZ
> >
> >
> >
> >
> >
> >
> > At this point, some may think that there may be a bug in the network test
> itself. So I tested the same code on another super-computer.
> >
> > On guillimin, a super-computer located at McGill University, I get an
> average latency (with Ray -test-network-only) of 10 microseconds when
> running Ray on 512 MPI ranks.
> >
> > See http://pastebin.com/nCKF8Xg6
> >
> > On guillimin, the hardware is Qlogic Infiniband QDR and the MPI
> middleware is MVAPICH2 1.6.
> >
> > Thus, I know that the network test in Ray works as expected because
> results on guillimin show a latency of 10 microseconds for 512 MPI ranks.
> >
> > guillimin also has 8 compute cores per node (Intel Nehalem).
> >
> > On guillimin, each node has one port (ibv_devinfo) and the max_mtu of
> HCAs is 4096 bytes. See http://pastebin.com/35T8N5t8
> >
> >
> >
> >
> >
> >
> >
> >
> > In Ray, only the following MPI functions are utilised:
> >
> > - MPI_Init
> > - MPI_Comm_rank
> > - MPI_Comm_size
> > - MPI_Finalize
> >
> > - MPI_Isend
> >
> > - MPI_Request_free
> > - MPI_Test
> > - MPI_Get_count
> > - MPI_Start
> > - MPI_Recv_init
> > - MPI_Cancel
> >
> > - MPI_Get_processor_name
> >
> >
> >
> >
> > 7. Please include information about your network:
> > http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot
> >
> > Type: Infiniband
> >
> >    7.1. Which OpenFabrics version are you running?
> >
> >
> > ofed-scripts-1.4.2-0_sunhpc1
> >
> > libibverbs-1.1.3-2.el5
> > libibverbs-utils-1.1.3-2.el5
> > libibverbs-devel-1.1.3-2.el5
> >
> >
> >    7.2. What distro and version of Linux are you running? What is your
> kernel version?
> >
> >
> > CentOS release 5.6 (Final)
> >
> > Linux colosse1 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011
> x86_64 x86_64 x86_64 GNU/Linux
> >
> >
> >    7.3. Which subnet manager are you running? (e.g., OpenSM, a
> vendor-specific subnet manager, etc.)
> >
> > opensm-libs-3.3.3-1.el5_6.1
> >
> >    7.4. What is the output of the ibv_devinfo command
> >
> >      hca_id: mlx4_0
> >              fw_ver:                         2.7.000
> >              node_guid:                      5080:0200:008d:8f88
> >              sys_image_guid:                 5080:0200:008d:8f8b
> >              vendor_id:                      0x02c9
> >              vendor_part_id:                 26428
> >              hw_ver:                         0xA0
> >              board_id:                       X6275_QDR_IB_2.5
> >              phys_port_cnt:                  1
> >                      port:   1
> >                              state:                  active (4)
> >                              max_mtu:                2048 (4)
> >                              active_mtu:             2048 (4)
> >                              sm_lid:                 1222
> >                              port_lid:               659
> >                              port_lmc:               0x00
> >
> >
> >
> >    7.5. What is the output of the ifconfig command
> >
> >    Not using IPoIB.
> >
> >    7.6. If running under Bourne shells, what is the output of the "ulimit
> -l" command?
> >
> > [sboisver12@colosse1 ~]$ ulimit -l
> > 6000000
> >
> >
> >
> >
> >
> >
> >
> > The two differences I see between guillimin and colosse are
> >
> > - Open-MPI 1.4.3 (colosse) v. MVAPICH2 1.6 (guillimin)
> > - Mellanox (colosse) v. QLogic (guillimin)
> >
> >
> > Does anyone experienced such a high latency with Open-MPI 1.4.3 on
> Mellanox HCAs ?
> >
> >
> >
> >
> >
> >
> > Thank you for your time.
> >
> >
> >                  S?bastien Boisvert
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
> ------------------------------
>
> Message: 12
> Date: Tue, 20 Sep 2011 14:25:09 +0200
> From: Reuti <re...@staff.uni-marburg.de>
> Subject: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect
>        floating-point results?
> To: tpri...@computer.org, Open MPI Users <us...@open-mpi.org>
> Message-ID:
>        <01fae7a2-f1a2-410a-8cee-d84e91449...@staff.uni-marburg.de>
> Content-Type: text/plain; charset=windows-1252
>
> Am 20.09.2011 um 13:52 schrieb Tim Prince:
>
> > On 9/20/2011 7:25 AM, Reuti wrote:
> >> Hi,
> >>
> >> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
> >>
> >>> I am observing differences in floating-point results from an
> application program that appear to be related to whether I link with OpenMPI
> 1.4.3 or MVAPICH 1.2.0.  Both packages were built with the same installation
> of Intel 11.1, as well as the application program; identical flags passed to
> the compiler in each case.
> >>>
> >>> I?ve tracked down some differences in a compute-only routine where I?ve
> printed out the inputs to the routine (to 18 digits) ; the inputs are
> identical.  The output numbers are different in the 16th place (perhaps a
> few in the 15th place).  These differences only show up for optimized code,
> not for ?O0.
> >>>
> >>> My assumption is that some optimized math intrinsic is being replaced
> dynamically, but I do not know how to confirm this.  Anyone have guidance to
> offer? Or similar experience?
> >>
> >> yes, I face it often but always at a magnitude where it's not of any
> concern (and not related to any MPI). Due to the limited precision in
> computers, a simple reordering of operation (although being equivalent in a
> mathematical sense) can lead to different results. Removing the anomalies
> with -O0 could proof that.
> >>
> >> The other point I heard especially for the x86 instruction set is, that
> the internal FPU has still 80 bits, while the presentation in memory is only
> 64 bit. Hence when all can be done in the registers, the result can be
> different compared to the case when some interim results need to be stored
> to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it
> to stay always in 64 bit, and a similar one for Intel is -mp (now
> -fltconsistency) and -mp1.
> >>
> > Diagnostics below indicate that ifort 11.1 64-bit is in use.  The options
> aren't the same as Reuti's "now" version (a 32-bit compiler which hasn't
> been supported for 3 years or more?).
>
> In the 11.1 documentation they are also still listed:
>
>
> http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm
>
> I read it in the way, that -mp is deprecated syntax (therefore listed under
> "Alternate Options"), but -fltconsistency is still a valid and supported
> option.
>
> -- Reuti
>
>
> > With ifort 10.1 and more recent, you would set at least
> > -assume protect_parens -prec-div -prec-sqrt
> > if you are interested in numerical consistency.  If you don't want
> auto-vectorization of sum reductions, you would use instead
> > -fp-model source -ftz
> > (ftz sets underflow mode back to abrupt, while "source" sets gradual).
> > It may be possible to expose 80-bit x87 by setting the ancient -mp
> option, but such a course can't be recommended without additional cautions.
> >
> > Quoted comment from OP seem to show a somewhat different question: Does
> OpenMPI implement any operations in a different way from MVAPICH?  I would
> think it probable that the answer could be affirmative for operations such
> as allreduce, but this leads well outside my expertise with respect to
> specific MPI implementations.  It isn't out of the question to suspect that
> such differences might be aggravated when using excessively aggressive ifort
> options such as -fast.
> >
> >
> >>>         libifport.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5
> (0x00002b6e7e081000)
> >>>         libifcoremt.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5
> (0x00002b6e7e1ba000)
> >>>         libimf.so =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so (0x00002b6e7e45f000)
> >>>         libsvml.so =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so (0x00002b6e7e7f4000)
> >>>         libintlc.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b6e7ea0a000)
> >>>
> >
> > --
> > Tim Prince
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>
>
>
> ------------------------------
>
> Message: 13
> Date: Tue, 20 Sep 2011 05:34:51 -0700
> From: Rolf vandeVaart <rvandeva...@nvidia.com>
> Subject: Re: [OMPI users] MPI hangs on multiple nodes
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID:
>        <3af945ebf4d3ec41afe44eed9b0585f32689f8c...@hqmail02.nvidia.com>
> Content-Type: text/plain; charset="us-ascii"
>
>
> >> 1: After a reboot of two nodes I ran again, and the inter-node freeze
> didn't
> >happen until the third iteration. I take that to mean that the basic
> >communication works, but that something is saturating. Is there some
> notion
> >of buffer size somewhere in the MPI system that could explain this?
> >
> >Hmm.  This is not a good sign; it somewhat indicates a problem with your
> OS.
> >Based on this email and your prior emails, I'm guessing you're using TCP
> for
> >communication, and that the problem is based on inter-node communication
> >(e.g., the problem would occur even if you only run 1 process per machine,
> >but does not occur if you run all N processes on a single machine, per
> your #4,
> >below).
> >
>
> I agree with Jeff here.  Open MPI uses lazy connections to establish
> connections and round robins through the interfaces.
> So, the first few communications could work as they are using interfaces
> that could communicate between the nodes, but the third iteration uses an
> interface that for some reason cannot establish the connection.
>
> One flag you can use that may help is --mca btl_base_verbose 20, like this;
>
> mpirun --mca btl_base_verbose 20 connectivity_c
>
> It will dump out a bunch of stuff, but there will be a few lines that look
> like this:
>
> [...snip...]
> [dt:09880] btl: tcp: attempting to connect() to [[58627,1],1] address
> 10.20.14.101 on port 1025
> [...snip...]
>
> Rolf
>
>
>
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may
> contain
> confidential information.  Any unauthorized review, use, disclosure or
> distribution
> is prohibited.  If you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> -----------------------------------------------------------------------------------
>
>
>
> ------------------------------
>
> Message: 14
> Date: Tue, 20 Sep 2011 13:12:48 +0000
> From: "Barrett, Brian W" <bwba...@sandia.gov>
> Subject: Re: [OMPI users] Open MPI and Objective C
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID:
>        <69a29ab53d57f54d81061a9e4e45b8fd0f991...@exmb01.srn.sandia.gov>
> Content-Type: text/plain; charset="us-ascii"
>
> The problem you're running into is not due to Open MPI.  The Objective C
> and C compilers on OS X (and most platforms) are the same binary, so you
> should be able to use mpicc without any problems.  It will see the .m
> extension and switch to Objective C mode.  However, NSLog is in the
> Foundation framework, so you must add the compiler option
>
>  -framework Foundation
>
> to the compiler flags (both when compiling and linking).  If you ripped out
> all the MPI and used gcc directly to compile your example code, you'd run
> into the same linker error without the -framework option.
>
> Hope this helps,
>
> Brian
>
> --
>  Brian W. Barrett
>  Scalable System Software Group
>  Sandia National Laboratories
> ________________________________________
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of
> Jeff Squyres [jsquy...@cisco.com]
> Sent: Monday, September 19, 2011 6:46 AM
> To: Open MPI Users
> Subject: Re: [OMPI users] Open MPI and Objective C
>
> +1
>
> You'll probably have to run "mpicc --showme" to see all the flags that OMPI
> is passing to the underlying compiler, and use those (or equivalents) to the
> ObjC compiler.
>
>
> On Sep 19, 2011, at 8:34 AM, Ralph Castain wrote:
>
> > Nothing to do with us - you call a function "NSLog" that Objective C
> doesn't recognize. That isn't an MPI function.
> >
> > On Sep 18, 2011, at 8:20 PM, Scott Wilcox wrote:
> >
> >> I have been asked to convert some C++ code using Open MPI to Objective C
> and I am having problems getting a simple Obj C program to compile.  I have
> searched through the FAQs and have not found anything specific.  Is it an
> incorrect assumption that the C interfaces work with Obj C, or am I missing
> something?
> >>
> >> Thanks in advance for your help!
> >> Scott
> >>
> >>
> >> open MPI version: 1.4.3
> >> OSX 10.5.1
> >>
> >> file: main.m
> >>
> >> #import <Foundation/Foundation.h>
> >> #import "mpi.h"
> >>
> >> int main (int argc, char** argv)
> >>
> >> {
> >>    //***
> >>    // Variable Declaration
> >>    //***
> >>    int theRank;
> >>    int theSize;
> >>
> >>    //***
> >>    // Initializing Message Passing Interface
> >>    //***
> >>    MPI_Init(&argc,&argv);
> >>    MPI_Comm_size(MPI_COMM_WORLD,&theSize);
> >>    MPI_Comm_rank(MPI_COMM_WORLD,&theRank);
> >>    //*** end
> >>
> >>    NSLog(@"Executing open MPI Objective C");
> >>
> >> }
> >>
> >> Compile:
> >>
> >> [87]UNC ONLY: SAW>mpicc main.m -o test
> >> Undefined symbols:
> >>   "___CFConstantStringClassReference", referenced from:
> >>       cfstring=Executing open MPI Objective C in ccj1AlL9.o
> >>   "_NSLog", referenced from:
> >>       _main in ccj1AlL9.o
> >> ld: symbol(s) not found
> >> collect2: ld returned 1 exit status
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
> ------------------------------
>
> Message: 15
> Date: Tue, 20 Sep 2011 08:44:07 -0600
> From: "Samuel K. Gutierrez" <sam...@lanl.gov>
> Subject: Re: [OMPI users] How could OpenMPI (or MVAPICH) affect
>        floating-point results?
> To: Open MPI Users <us...@open-mpi.org>
> Message-ID: <91688ae1-df54-4957-a0a0-902fc2e9b...@lanl.gov>
> Content-Type: text/plain; charset=windows-1252
>
> Hi,
>
> Maybe you can leverage some of the techniques outlined in:
>
> Robert W. Robey, Jonathan M. Robey, and Rob Aulwes. 2011. In search of
> numerical consistency in parallel programming. Parallel Comput. 37, 4-5
> (April 2011), 217-229. DOI=10.1016/j.parco.2011.02.009
> http://dx.doi.org/10.1016/j.parco.2011.02.009
>
> Hope that helps,
>
> Samuel K. Gutierrez
> Los Alamos National Laboratory
>
> On Sep 20, 2011, at 6:25 AM, Reuti wrote:
>
> > Am 20.09.2011 um 13:52 schrieb Tim Prince:
> >
> >> On 9/20/2011 7:25 AM, Reuti wrote:
> >>> Hi,
> >>>
> >>> Am 20.09.2011 um 00:41 schrieb Blosch, Edwin L:
> >>>
> >>>> I am observing differences in floating-point results from an
> application program that appear to be related to whether I link with OpenMPI
> 1.4.3 or MVAPICH 1.2.0.  Both packages were built with the same installation
> of Intel 11.1, as well as the application program; identical flags passed to
> the compiler in each case.
> >>>>
> >>>> I?ve tracked down some differences in a compute-only routine where
> I?ve printed out the inputs to the routine (to 18 digits) ; the inputs are
> identical.  The output numbers are different in the 16th place (perhaps a
> few in the 15th place).  These differences only show up for optimized code,
> not for ?O0.
> >>>>
> >>>> My assumption is that some optimized math intrinsic is being replaced
> dynamically, but I do not know how to confirm this.  Anyone have guidance to
> offer? Or similar experience?
> >>>
> >>> yes, I face it often but always at a magnitude where it's not of any
> concern (and not related to any MPI). Due to the limited precision in
> computers, a simple reordering of operation (although being equivalent in a
> mathematical sense) can lead to different results. Removing the anomalies
> with -O0 could proof that.
> >>>
> >>> The other point I heard especially for the x86 instruction set is, that
> the internal FPU has still 80 bits, while the presentation in memory is only
> 64 bit. Hence when all can be done in the registers, the result can be
> different compared to the case when some interim results need to be stored
> to RAM. For the Portland compiler there is a switch -Kieee -pc64 to force it
> to stay always in 64 bit, and a similar one for Intel is -mp (now
> -fltconsistency) and -mp1.
> >>>
> >> Diagnostics below indicate that ifort 11.1 64-bit is in use.  The
> options aren't the same as Reuti's "now" version (a 32-bit compiler which
> hasn't been supported for 3 years or more?).
> >
> > In the 11.1 documentation they are also still listed:
> >
> >
> http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/index.htm
> >
> > I read it in the way, that -mp is deprecated syntax (therefore listed
> under "Alternate Options"), but -fltconsistency is still a valid and
> supported option.
> >
> > -- Reuti
> >
> >
> >> With ifort 10.1 and more recent, you would set at least
> >> -assume protect_parens -prec-div -prec-sqrt
> >> if you are interested in numerical consistency.  If you don't want
> auto-vectorization of sum reductions, you would use instead
> >> -fp-model source -ftz
> >> (ftz sets underflow mode back to abrupt, while "source" sets gradual).
> >> It may be possible to expose 80-bit x87 by setting the ancient -mp
> option, but such a course can't be recommended without additional cautions.
> >>
> >> Quoted comment from OP seem to show a somewhat different question: Does
> OpenMPI implement any operations in a different way from MVAPICH?  I would
> think it probable that the answer could be affirmative for operations such
> as allreduce, but this leads well outside my expertise with respect to
> specific MPI implementations.  It isn't out of the question to suspect that
> such differences might be aggravated when using excessively aggressive ifort
> options such as -fast.
> >>
> >>
> >>>>        libifport.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libifport.so.5
> (0x00002b6e7e081000)
> >>>>        libifcoremt.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libifcoremt.so.5
> (0x00002b6e7e1ba000)
> >>>>        libimf.so =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libimf.so (0x00002b6e7e45f000)
> >>>>        libsvml.so =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libsvml.so (0x00002b6e7e7f4000)
> >>>>        libintlc.so.5 =>
>  /opt/intel/Compiler/11.1/072/lib/intel64/libintlc.so.5 (0x00002b6e7ea0a000)
> >>>>
> >>
> >> --
> >> Tim Prince
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 2018, Issue 1
> **************************************
>

Reply via email to