Perhaps I was mistaken about 1.5rc1. As for the installed openMPI
on mac osx, my 10.5 OSX has v1.2.3 when I try to run it, it works
fine locally but it never finds the xgrid.
any mpi job I run, will run on the localhost not the xgrid agents. If
try to force the issue by specifying -nolocal then it just complains
there are no nodes.
SO how do I use openMPI so that it uses the nodes of an xgrid cluster?
mpirun -nolocal -n 32 /bin/hostname
--------------------------------------------------------------------------
There are no available nodes allocated to this job. This could be
because
no nodes were found or all the available nodes were already used.
Note that since the -nolocal option was given no processes can be
launched on the local node.
--------------------------------------------------------------------------
[ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/
base/rmaps_base_support_fns.c at line 168
[ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/
round_robin/rmaps_rr.c at line 402
[ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/
base/rmaps_base_map_job.c at line 210
[ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmgr/
urm/rmgr_urm.c at line 372
On Jun 16, 2010, at 1:36 PM, Ralph Castain wrote:
Where did you see that 1.5 works with xgrid? That support has been
broken since the 1.2 series, unfortunately, so it would help to
ensure we don't have stale docs out there to the contrary.
As for the 1.2 results, you are aware (I imagine) that OSX ships
with the last 1.2 release already installed? You don't have to do
anything to use it but run.
If you are getting peer timeouts, that is almost always a firewall
issue. But I would try the factory-installed version first to be sure.
On Jun 16, 2010, at 1:14 PM, Charlie E. Strauss wrote:
I'm new to openMPI. I'm trying to set it up for using xgrid. I
have read
that v1.3 and v1.4 are broken on OSX 10.5 and 10.6 although I have
seen
some discussions in the archives of this mail list saying some
people have
v1.4 running on 10.6.
I have now compiled both openMPI 1.2 and openMPI1.5rc and neither of
these is working for me with xgrid. Both of these say they work
with
xgrid.
The failuremodes are different.
Anyone know how to get a working install? I am building this on a
OSX 10.5.8
machine. THe xgrid controller is on a OSX 10.6 server machine. I
have tried
configuring with and without the --with-xgrid option.
Behaviour of openMPI1.2
$ /usr/local/openmpi/bin/mpirun -nolocal -n 2 /bin/hostname
THe job appears in the xgrid queue, and the logs show it is running
on a
remote machine. However nothing ever happens and peeking in the
xgrid
results I see:
$ xgrid -job results -id 8703
[brio.llnl.gov:38789] [0,0,1]-[0,0,0]
mca_oob_tcp_peer_complete_connect:
connection failed: Operation timed out (60) - retrying
[brio.llnl.gov:38792] [0,0,2]-[0,0,0]
mca_oob_tcp_peer_complete_connect:
connection failed: Operation timed out (60) - retrying
Perhaps a firewall issue?
Of course I'm more interested in getting the new openMPI1.5 working.
When I run this, again I get an entry in the queue, and the job
runs on a
remote machine but I get a job failed message
$ /usr/local/openmpi5/bin/mpirun -n 2 /bin/hostname
$ xgrid -job results -id 8702
[brio.llnl.gov:38776] Error: unknown option "-mca"
----
Note I have NOT installed openMPI on any of the other computers in
the
grid. So perhaps that is the problem? If I did install it on other
computers how would I tell mpirun where to find the path to the
install
point?
----
Finally in both cases, I don't see any way to pass xgrid specific
argument
in on the mpi command line. An xgrid controller divides the agents
into
sets of logical grids and you need to specify which logical grid to
submit
the job to. In xgrid cli syntax one write "xgrid -gid 2" for
grid 2.
When I use openMPI all the jobs get sent to just the default grid
which is
the grid that xgrid uses if no gid is specified.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
Charlie Strauss
Bioscience Division
c...@lanl.gov
505 665 4838
Quidquid latine dictum sit, altum sonatur.