Hi Allan

I'm glad that it fixed your problem! I didn't see any IPv4 addresses in your ifconfig output, so I suspect that is why it wasn't working with --disable-ipv6.

So I guess it sounds like the 1.2 series isn't actually disabling IPv6 support when we --disable-ipv6 if nothing but IPv6 interfaces are available. I doubt we'll go back to fix that, but it is good to know.

Thanks for the patience
Ralph


On Oct 31, 2008, at 6:47 PM, Allan Menezes wrote:

users-requ...@open-mpi.org wrote:
Send users mailing list submissions to
        us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
        users-requ...@open-mpi.org

You can reach the person managing the list at
        users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

   1. Re: Problem with openmpi version 1.3b1 beta1 (Ralph Castain)
   2. Re: problem running Open MPI on Cells (Mi Yan)


Hi Ralph,
I solved the problem. With Ver 1.28 same system this configure works ./configure --prefix=/opt/openmpi128 --enable-mpi-threads --with- threads=posix --disable-ipv6 but does not work with any ver 1.3 beacuse of ipv6 so to fix it i rebuilt it after make clean with ipv6 enabled and it works!
This configure for ver 1.3 works on my system
./configure --prefix=/opt/openmpi128 --enable-mpi-threads --with- threads=posix
Do you still want the old or the new config.log?
Allan Menezes
----------------------------------------------------------------------

Message: 1
Date: Fri, 31 Oct 2008 17:02:15 -0600
From: Ralph Castain <r...@lanl.gov>
Subject: Re: [OMPI users] Problem with openmpi version 1.3b1 beta1
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <6bcb1362-ea2c-4b4b-ab1a-367ed7739...@lanl.gov>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

I see you are using IPv6. From what I can tell, we do enable that
support by default if the underlying system supports it.

My best guess is that either that support is broken (we never test it
since none of us use IPv6), or our configure system isn't properly
detecting that it exists.

Can you attach a copy of your config.log? It will tell us what the
system thinks it should be building.

Thanks
Ralph

On Oct 31, 2008, at 4:54 PM, Allan Menezes wrote:


Date: Fri, 31 Oct 2008 09:34:52 -0600
From: Ralph Castain <r...@lanl.gov>
Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 1
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <0cf28492-b13e-4f82-ac43-c1580f079...@lanl.gov>
Content-Type: text/plain; charset="us-ascii"; Format="flowed";
        DelSp="yes"

It looks like the daemon isn't seeing the other interface address
on  host x2. Can you ssh to x2 and send the contents of ifconfig -a?

Ralph

On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:



users-requ...@open-mpi.org wrote:


Send users mailing list submissions to
        us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
        users-requ...@open-mpi.org

You can reach the person managing the list at
        users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

 1. Openmpi ver1.3beta1 (Allan Menezes)
 2. Re: Openmpi ver1.3beta1 (Ralph Castain)
 3. Re: Equivalent .h files (Benjamin Lamptey)
 4. Re: Equivalent .h files (Jeff Squyres)
 5. ompi-checkpoint is hanging (Matthias Hovestadt)
 6. unsubscibe (Bertrand P. S. Russell)
 7. Re: ompi-checkpoint is hanging (Tim Mattox)


----------------------------------------------------------------------

Message: 1
Date: Fri, 31 Oct 2008 02:06:09 -0400
From: Allan Menezes <amenezes...@sympatico.ca>
Subject: [OMPI users] Openmpi ver1.3beta1
To: us...@open-mpi.org
Message-ID: <blu0-smtp224b5e356302ac7aa4481088...@phx.gbl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
  I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
--with-threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes
from  the
head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1 --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
does
not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes


------------------------------

Message: 2
Date: Fri, 31 Oct 2008 02:41:59 -0600
From: Ralph Castain <r...@lanl.gov>
Subject: Re: [OMPI users] Openmpi ver1.3beta1
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <e8af5aaf-99cb-4efc-aa97-5385ce333...@lanl.gov>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

When you typed the --host x1 command, were you sitting on x1?
Likewise, when you typed the --host x2 command, were you not on
host x2?

If the answer to both questions is "yes", then my guess is that
something is preventing you from launching a daemon on host x2. Try
adding --leave-session-attached to your cmd line and see if any
error
messages appear. And check the FAQ for tips on how to setup for ssh
launch (I'm assuming that is what you are using).

http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:




Hi Ralph,
 Yes that is true I tried both commands on x1 and ver 1.28 works
on the same setup without a problem.
Here is the output with the added
--leave-session-attached
[allan@x1 ~]$ mpiexec --prefix /opt/openmpi13b2  --leave-session-
attached -host x2 hostname
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]
mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed:
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0]
mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed:
Network is unreachable (101)
[x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection
to  lifeline [[1354,0],0] lost
--------------------------------------------------------------------------
A daemon (pid 7665) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see
above).

This may be because the daemon was unable to find all the needed
shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to
have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that the job aborted, but has no info as to the
process
that caused that situation.
--------------------------------------------------------------------------
mpiexec: clean termination accomplished

[allan@x1 ~]$
However my main eth0 IP is 192.168.1.1 and internet gate way is
192.168.0.1
Any solutions?
Allan Menezes





Hi,
I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads -- with-
threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from
the head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1 --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
does not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Hi Ralph,
   It works for openmpi version 1.28 why should it not work for
version 1.3?
Yes I can ssh to x2 from x1 and x1 from x2.
Here if the ifconfig -a for x1:
eth0      Link encap:Ethernet  HWaddr 00:1B:21:02:89:DA
inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
        inet6 addr: fe80::21b:21ff:fe02:89da/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:44906 errors:0 dropped:0 overruns:0 frame:0
        TX packets:77644 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:3309896 (3.1 MiB)  TX bytes:101134505 (96.4 MiB)
        Memory:feae0000-feb00000

eth1      Link encap:Ethernet  HWaddr 00:0E:0C:BC:AB:6D
inet addr:192.168.3.1  Bcast:192.168.3.255  Mask:255.255.255.0
        inet6 addr: fe80::20e:cff:febc:ab6d/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:124 errors:0 dropped:0 overruns:0 frame:0
        TX packets:133 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:7440 (7.2 KiB)  TX bytes:10027 (9.7 KiB)

eth2      Link encap:Ethernet  HWaddr 00:1B:FC:A0:A7:92
inet addr:192.168.7.1  Bcast:192.168.7.255  Mask:255.255.255.0
        inet6 addr: fe80::21b:fcff:fea0:a792/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:159 errors:0 dropped:0 overruns:0 frame:0
        TX packets:158 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:10902 (10.6 KiB)  TX bytes:13691 (13.3 KiB)
        Interrupt:17

eth4      Link encap:Ethernet  HWaddr 00:0E:0C:B9:50:A3
inet addr:192.168.0.198  Bcast:192.168.0.255  Mask:255.255.255.0
        inet6 addr: fe80::20e:cff:feb9:50a3/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:25111 errors:0 dropped:0 overruns:0 frame:0
        TX packets:11633 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:24133775 (23.0 MiB)  TX bytes:833868 (814.3 KiB)

lo        Link encap:Local Loopback          inet addr:127.0.0.1
Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING  MTU:16436  Metric:1
        RX packets:28973 errors:0 dropped:0 overruns:0 frame:0
        TX packets:28973 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:1223211 (1.1 MiB)  TX bytes:1223211 (1.1 MiB)

pan0      Link encap:Ethernet  HWaddr CA:00:CE:02:90:90
BROADCAST MULTICAST  MTU:1500  Metric:1
        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

sit0      Link encap:IPv6-in-IPv4          NOARP  MTU:1480  Metric:1
        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

virbr0    Link encap:Ethernet  HWaddr EA:6D:E7:85:8D:E7
inet addr:192.168.122.1  Bcast:192.168.122.255  Mask:255.255.255.0
        inet6 addr: fe80::e86d:e7ff:fe85:8de7/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
        TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:0 (0.0 b)  TX bytes:5083 (4.9 KiB)

Here is the ifconfig -a for x2:
eth0      Link encap:Ethernet  HWaddr 00:1B:21:02:DE:E9
inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
        inet6 addr: fe80::21b:21ff:fe02:dee9/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:565 errors:0 dropped:0 overruns:0 frame:0
        TX packets:565 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:181079 (176.8 KiB)  TX bytes:106650 (104.1 KiB)
        Memory:feae0000-feb00000

eth1      Link encap:Ethernet  HWaddr 00:0E:0C:BC:B1:7D
inet addr:192.168.3.2  Bcast:192.168.3.255  Mask:255.255.255.0
        inet6 addr: fe80::20e:cff:febc:b17d/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:11 errors:0 dropped:0 overruns:0 frame:0
        TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:660 (660.0 b)  TX bytes:1136 (1.1 KiB)

eth2      Link encap:Ethernet  HWaddr 00:1F:C6:27:1C:79
inet addr:192.168.7.2  Bcast:192.168.7.255  Mask:255.255.255.0
        inet6 addr: fe80::21f:c6ff:fe27:1c79/64 Scope:Link
        UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
        RX packets:11 errors:0 dropped:0 overruns:0 frame:0
        TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:1000
        RX bytes:506 (506.0 b)  TX bytes:1094 (1.0 KiB)
        Interrupt:17

lo        Link encap:Local Loopback          inet addr:127.0.0.1
Mask:255.0.0.0
        inet6 addr: ::1/128 Scope:Host
        UP LOOPBACK RUNNING  MTU:16436  Metric:1
        RX packets:1604 errors:0 dropped:0 overruns:0 frame:0
        TX packets:1604 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:140216 (136.9 KiB)  TX bytes:140216 (136.9 KiB)

sit0      Link encap:IPv6-in-IPv4          NOARP  MTU:1480  Metric:1
        RX packets:0 errors:0 dropped:0 overruns:0 frame:0
        TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
        collisions:0 txqueuelen:0
        RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Any help would be appreciated!
Allan Menezes
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



------------------------------

Message: 2
Date: Fri, 31 Oct 2008 20:33:33 -0400
From: Mi Yan <mi...@us.ibm.com>
Subject: Re: [OMPI users] problem running Open MPI on Cells
To: Open MPI Users <us...@open-mpi.org>
Cc: Open MPI Users <us...@open-mpi.org>, users-boun...@open-mpi.org
Message-ID:
<ofd6258791.6ad0754a-on852574f4.0001b465-852574f4.00030...@us.ibm.com >
Content-Type: text/plain; charset="us-ascii"


Where did you put the environment variable related to MCF licence file and
MCF share libraries?
What is your default shell?

Did you test  indicate the following?
Suppose you have 4 nodes,
on node 1, " mpirun -np 4 --host node1,node2,node3,node4 hostname" works, but "mpirun -np4 --host node1,node2,node3,node4 foocbe" does not work,
where foocbe is executable generated with MCF.

It is possible that MCF license is limited to a few concurrent use? e.g. the license is limited to 4 current use, and mpi application will fails
on 8 nodes?

Regards,
Mi



             Hahn Kim
             <h...@ll.mit.edu>
Sent by: To users-bounces@ope Open MPI Users <us...@open-mpi.org > n- mpi.org cc

Subject 10/31/2008 03:38 [OMPI users] problem running Open
             PM                        MPI on Cells


             Please respond to
              Open MPI Users
             <users@open-mpi.o
                    rg>






Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's
Cell Accelerator Boards (CABs).

We have an MPI application that is running on multiple CABs.  The
application uses Mercury's MultiCore Framework (MCF) to use the Cell's
SPEs.  Here's the basic problem.  I can log into each CAB and run the
application in serial directly from the command line (i.e. without
using mpirun) without a problem.  I can also launch a serial job onto
each CAB from another machine using mpirun without a problem.

The problem occurs when I try to launch onto multiple CABs using
mpirun.  MCF requires a license file.  After the application
initializes MPI, it tries to initialized MCF on each node.  The
initialization routine loads the MCF license file and checks for valid
license keys.  If the keys are valid, then it continues to initialize
MCF.  If not, it throws an error.

When I run on multiple CABs, most of the time several of the CABs
throw an error saying MCF cannot find a valid license key.  The
strange this is that this behavior doesn't appear when I launch serial
jobs using MCF, only multiple CABs.  Additionally, the errors are
inconsistent.  Not all the CABs throw an error, sometimes a few of
them error out, sometimes all of them, sometimes none.

I've talked with the Mercury folks and they're just as stumped as I
am.  The only thing we can think of is that OpenMPI is somehow
modifying the environment and is interfering with MCF, but we can't
think of any reason why.

Any ideas out there?  Thanks.

Hahn

--
Hahn Kim, h...@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255






_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
-------------- next part --------------
HTML attachment scrubbed and removed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment.gif >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pic18585.gif
Type: image/gif
Size: 1255 bytes
Desc: not available
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0001.gif >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://www.open-mpi.org/MailArchives/users/attachments/20081031/2d67d208/attachment-0002.gif >

------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 1052, Issue 10
***************************************



_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to