Re: [OMPI users] bproc problems

gshipman Thu, 26 Apr 2007 16:33:06 -0400

There is a known issue on BProc 4 w.r.t. pty support. Open MPI bydefault will try to use ptys for I/O forwarding but will revert topipes if ptys are not available.

You can "safely" ignore the pty warnings, or you may want to rerunconfigure and add:

--disable-pty-support

I say "safely" because my understanding is that some I/O data may belost if pipes are used during abnormal termination.

Alternatively you might try getting pty support working, you need toconfigure ptys on the backend nodes.You can then try the following code to test if it is workingcorrectly, if this fails (it does on our BProc 4 cluster) youshouldn't use ptys on BProc.



#include <pty.h>
#include <utmp.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

int
main(int argc, char *agrv[])
{
  int amaster, aslave;

  if (openpty(&amaster, &aslave, NULL, NULL, NULL) < 0) {

printf("openpty() failed with errno = %d, %s\n", errno, strerror(errno));

  } else {
    printf("openpty() succeeded\n");
  }

  return 0;
}






On Apr 26, 2007, at 2:06 PM, Daniel Gruner wrote:

Hi

I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc-
based clusters, and I have found some problems/issues.  All my
clusters have standard ethernet interconnects, either 100Base/T or
Gigabit, on standard switches.

The clusters are all running Clustermatic 5 (BProc 4.x), and range
from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron.  In all cases
the same problems occur, identically.  I attach here the results
from "ompi_info --all" and the config.log, for my latest build on
an Opteron cluster, using the Pathscale compilers.  I had exactly
the same problems when using the vanilla GNU compilers.

Now for a description of the problem:

When running an mpi code (cpi.c, from the standard mpi examples, also
attached), using the mpirun defaults (e.g. -byslot), with a single
process:

        sonoma:dgruner{134}> mpirun -n 1 ./cpip
        [n17:30019] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        pi is approximately 3.1415926544231341, Error is 0.0000000008333410
        wall clock time = 0.000199

However, if one tries to run more than one process, this bombs:

        sonoma:dgruner{134}> mpirun -n 2 ./cpip
        .
        .
        .
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        .
        . ad infinitum

If one uses de option "-bynode", things work:

        sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip
        [n17:30055] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 1 on n21
        pi is approximately 3.1415926544231318, Error is 0.0000000008333387
        wall clock time = 0.010375

Note that there is always the message about "openpty failed, usingpipes instead".

If I run more processes (on my 3-node cluster, with 2 cpus pernode), the

openpty message appears repeatedly for the first node:

        sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 2 on n49
        Process 1 on n21
        Process 5 on n49
        Process 3 on n17
        Process 4 on n21
        pi is approximately 3.1415926544231239, Error is 0.0000000008333307
        wall clock time = 0.050332

Should I worry about the openpty failure? I suspect thatcommunicationsmay be slower this way. Using the -byslot option always fails, sothisis a bug. The same occurs for all the codes that I have tried,both simple

and complex.

Thanks for your attention to this.
Regards,
Daniel
--

Dr. Daniel Gruner                        dgru...@chem.utoronto.ca
Dept. of Chemistry                       daniel.gru...@utoronto.ca
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key
<cpi.c.gz>
<config.log.gz>
<ompiinfo.gz>
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] bproc problems

Reply via email to