A small update. I was looking through the error file a bit more
(it was 159MB). I found the following error message sequence:

o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
[o4:11242] [0,1,4]-[0,0,0] mca_oob_tcp_peer_recv_blocking: recv() failed with errno=104
[o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
...
[o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
[o3:32205] [0,1,2]-[0,0,0] mca_oob_tcp_peer_complete_connect: connection failed (errno=111) - retrying (pid=32205)
[o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
[o3:32206] [0,1,3]-[0,0,0] mca_oob_tcp_peer_complete_connect: connection failed (errno=111) - retrying (pid=32206)
[o1:22805] mca_oob_tcp_accept: accept() failed with errno 9.
...

I don't know if this changes things (my google attempts didn't
really give me much information).

Jeff


Good afternoon,

   I really hate to post asking for help with a problem, but
my own efforts have not worked out well (probably operator
error).
   Anyway, I'm trying to run a code that was built with PGI 6.1
and OpenMPI-1.1.1. The mpirun command looks like:

mpirun --hostfile machines.${PBS_JOBID} --np ${NP} -mca btl self,sm,tcp ./${EXE} ${CASEPROJ} >> OUTPUT

I get the following error in the PBS error file:

[o1:22559] mca_oob_tcp_accept: accept() failed with errno 9.
...

and keeps repeating (for a long time).

ompi_info gives the following output:

 > ompi_info
                Open MPI: 1.1.1
   Open MPI SVN revision: r11473
                Open RTE: 1.1.1
   Open RTE SVN revision: r11473
                    OPAL: 1.1.1
       OPAL SVN revision: r11473
                  Prefix: /usr/x86_64-pgi-6.1/openmpi-1.1.1
 Configured architecture: x86_64-suse-linux-gnu
           Configured by: root
           Configured on: Mon Oct 16 20:51:34 MDT 2006
          Configure host: lo248
                Built by: root
                Built on: Mon Oct 16 21:02:00 MDT 2006
              Built host: lo248
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: pgcc
     C compiler absolute: /opt/pgi/linux86-64/6.1/bin/pgcc
            C++ compiler: pgCC
   C++ compiler absolute: /opt/pgi/linux86-64/6.1/bin/pgCC
      Fortran77 compiler: pgf77
  Fortran77 compiler abs: /opt/pgi/linux86-64/6.1/bin/pgf77
      Fortran90 compiler: pgf90
  Fortran90 compiler abs: /opt/pgi/linux86-64/6.1/bin/pgf90
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: yes
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
              MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.1.1)
           MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.1.1)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.1.1)
           MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.1.1)
               MCA timer: linux (MCA v1.0, API v1.0, Component v1.1.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: hierarch (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.1.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.1.1)
               MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.1.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: gm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: self (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: sm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.0)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                  MCA ns: replica (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.1.1)
               MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.1.1)
                MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.1.1)
                MCA rmgr: urm (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: fork (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA pls: rsh (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.1.1)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.1.1)



I found this link via google:

http://www.open-mpi.org/community/lists/users/2006/06/1486.php

But to be honest I'm not sure how to apply this to fix my problem.

Thanks!

Jeff


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to