Hi Ankush

Please read the FAQ I sent you in the previous message.
That is the answer to your repeated question.
OpenMPI (and all MPIs that I know of) requires passwordless connections.
Your program fails because you didn't setup that.

If it worked with a single compute node,
that was most likely fortuitous,
not by design.
What you see on the screen are the ssh password messages
from your two compute nodes,
but OpenMPI (or any MPI)
won't wait for your typing passwords.
Imagine if you were running your program on 1000 nodes ...,
and,say, running the program 1000 times ...
would you really like to type all those one million passwords?
The design must be scalable.

Here is one recipe for passwordless ssh on clusters:

http://agenda.clustermonkey.net/index.php/Passwordless_SSH_Logins
http://agenda.clustermonkey.net/index.php/Passwordless_SSH_(and_RSH)_Logins

Read it carefully,
the comments about MPI(ch) 1.2 and PVM are somewhat out of date,
however, the ssh recipe is fine, detailed, and clear.
Note also the nuanced difference for NFS mounted home directories
versus separate home directories on each node.

Pay a visit to OpenSSH site also, for more information:
http://www.openssh.com/
http://en.wikipedia.org/wiki/OpenSSH

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Anush Kaul wrote:
Let me explain in detail,

when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node (192.168.45.65)
my openmpi-default-hostfile looked like/
192.168.67.18 slots=2
192.168.45.65 slots=2/

after this on running the command *miprun /work/Pi* on master node we got
/
# root@192.168.45.65 <mailto:root@192.168.45.65> password :/

after entering the password the program ran on both de nodes.

Now after connecting a second compute node, and editing the hostfile:

/192.168.67.18 slots=2
192.168.45.65 slots=2/
/192.168.67.241 slots=2

/and then running the command *miprun /work/Pi* on master node we got

# root@192.168.45.65 <mailto:root@192.168.45.65>'s password: root@192.168.67.241 <mailto:root@192.168.67.241>'s password:

which does not accept the password.

Although we are trying to implement the passwordless cluster. i wud like to know what this problem is occuring?


On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <g...@ldeo.columbia.edu <mailto:g...@ldeo.columbia.edu>> wrote:

    Ankush

    You need to setup passwordless connections with ssh to the node you just
    added.  You (or somebody else) probably did this already on the
    first compute node, otherwise the MPI programs wouldn't run
    across the network.

    See the very last sentence on this FAQ:

    http://www.open-mpi.org/faq/?category=running#run-prereqs

    And try this recipe (if you use RSA keys instead of DSA, replace all
    "dsa" by "rsa"):

    
http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3


    I hope this helps.

    Gus Correa
    ---------------------------------------------------------------------
    Gustavo Correa
    Lamont-Doherty Earth Observatory - Columbia University
    Palisades, NY, 10964-8000 - USA
    ---------------------------------------------------------------------


    Ankush Kaul wrote:

        Thank you, i m reading up on de tools u suggested.

        I am facing another problem, my cluster is working fine with 2
        hosts (1 master + 1 compute node) but when i tried 2 add another
        node (1 master + 2 compute node) its not working. it works fine
        when i give de command mpirun -host <hostname> /work/Pi

        but when i try to run
        mpirun  /work/Pi it gives following error:

        root@192.168.45.65 <mailto:root@192.168.45.65>
        <mailto:root@192.168.45.65 <mailto:root@192.168.45.65>>'s
        password: root@192.168.67.241 <mailto:root@192.168.67.241>
        <mailto:root@192.168.67.241 <mailto:root@192.168.67.241>>'s
        password:


        Permission denied, please try again. <The password i provide is
        correct>

        root@192.168.45.65 <mailto:root@192.168.45.65>
        <mailto:root@192.168.45.65 <mailto:root@192.168.45.65>>'s password:


        Permission denied, please try again.

        root@192.168.45.65 <mailto:root@192.168.45.65>
        <mailto:root@192.168.45.65 <mailto:root@192.168.45.65>>'s password:


        Permission denied (publickey,gssapi-with-mic,password).

Permission denied, please try again.

        root@192.168.67.241 <mailto:root@192.168.67.241>
        <mailto:root@192.168.67.241 <mailto:root@192.168.67.241>>'s
        password: [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout
        in file base/pls_base_orted_cmds.c at line 275


        [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
        pls_rsh_module.c at line 1166

        [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
        errmgr_hnp.c at line 90

        [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65
        failed to start as expected.

        [ccomp1.cluster:03503] ERROR: There may be more information
        available from

        [ccomp1.cluster:03503] ERROR: the remote shell (see above).

        [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly
        with status 255.

        [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
        base/pls_base_orted_cmds.c at line 188

        [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
        pls_rsh_module.c at line 1198


        What is the problem here?

        
--------------------------------------------------------------------------

        mpirun was unable to cleanly terminate the daemons for this job.
        Returned value Timeout instead of ORTE_SUCCESS


        On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <eugene....@sun.com
        <mailto:eugene....@sun.com> <mailto:eugene....@sun.com
        <mailto:eugene....@sun.com>>> wrote:

           Ankush Kaul wrote:

               Finally, after mentioning the hostfiles the cluster is
        working
               fine. We downloaded few benchmarking softwares but i
        would like
               to know if there is any GUI based benchmarking software
        so that
               its easier to demonstrate the working of our cluster while
               displaying our cluster.


           I'm confused what you're looking for here, but thought I'd
        venture a
           suggestion.

           There are GUI-based performance analysis and tracing tools.
         E.g.,
           run a program, [[semi-]automatically] collect performance
        data, run
           a GUI-based analysis tool on the data, visualize what happened on
           your cluster.  Would this suit your purposes?

           If so, there are a variety of tools out there you could try.
         Some
           are platform-specific or cost money.  Some are widely/freely
           available.  Examples of these tools include Intel Trace Analyzer,
           Jumpshot, Vampir, TAU, etc.  I do know that Sun Studio
        (Performance
           Analyzer) is available via free download on x86 and SPARC and
        Linux
           and Solaris and works with OMPI.  Possibly the same with
        Jumpshot.
            VampirTrace instrumentation is already in OMPI, but then you
        need
           to figure out the analysis-tool part.  (I think the Vampir
        GUI tool
           requires a license, but I'm not sure.  Maybe you can convert
        to TAU,
           which is probably available for free download.)

           Anyhow, I don't even know if that sort of thing fits your
           requirements.  Just an idea.

           _______________________________________________
           users mailing list
           us...@open-mpi.org <mailto:us...@open-mpi.org>
        <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>

           http://www.open-mpi.org/mailman/listinfo.cgi/users



        ------------------------------------------------------------------------


        _______________________________________________
        users mailing list
        us...@open-mpi.org <mailto:us...@open-mpi.org>
        http://www.open-mpi.org/mailman/listinfo.cgi/users


    _______________________________________________
    users mailing list
    us...@open-mpi.org <mailto:us...@open-mpi.org>
    http://www.open-mpi.org/mailman/listinfo.cgi/users



------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to