We're running SGE 6.2u5, OpenMPI 1.3.3 compiled with SGE integration.
Our cluster has some AMD and some Intel-based servers, but all are
managed through the same kickstart build and the same cfengine
configuration. The only deliberate differences in the nodes are:
ATLAS and BLAS libraries are optimized per-CPU type
the AMD nodes have disk drives that correctly report SMART readings,
so the smartd monitoring process runs on those machines
There are 3 PEs defined:
openmpi all nodes
openmpi-Intel Intel CPU nodes
openmpi-AMD AMD CPU nodes
The only known differences in the PEs are:
the hostgroup assigned to the PE (all nodes, just Intel nodes, just
AMD nodes)
the number of slots per PE
the environment variable "ARCHPATH" is set to the directory where
libraries optimized per-architecture are stored
the environment variable "ARCH" is set to the architecture (as in:
Intel-Nehalem)
(The environment variables are set outside of SGE jobs, so they will exist when
a job is launched via mpirun or submitted via qsub.)
I can run MPI jobs using "mpirun" on the Intel, AMD, or mixed sets of
nodes, using a machines file. Running the same commands as an SGE job
fails on the AMD nodes.
For example, running:
mpirun -np 50 -machinefile machines.AMD /bin/hostname succeeds
mpirun -np 50 -machinefile machines.AMD hello_world.mpi succeeds
mpirun -np 50 -machinefile machines.Intel /bin/hostname succeeds
mpirun -np 50 -machinefile machines.Intel hello_world.mpi succeeds
mpirun -np 50 -machinefile machines.mixed /bin/hostname succeeds
mpirun -np 50 -machinefile machines.mixed hello_world.mpi succeeds
qsub -pe openmpi-Intel 50 mpirun /bin/hostname succeeds
qsub -pe openmpi-Intel 50 mpirun hello_world.mpi succeeds
qsub -pe openmpi 50 mpirun /bin/hostname * FAILS
if AMD nodes are used *
qsub -pe openmpi 50 mpirun hello_world.mpi * FAILS
if AMD nodes are used *
qsub -pe openmpi-AMD 50 mpirun /bin/hostname * FAILS
*
qsub -pe openmpi-AMD 50 mpirun hello_world.mpi * FAILS
*
When I run the job on the openmpi-AMD PE with debugging statements,
I can see that it starts on a node, that the slave MPI processes are
dispatched. As expected, all processes are run only on AMD nodes. However,
there are no results and the job finishes without an error. It does
take longer (~minutes) for the job to finish than the jobs that work
correctly on the Intel nodes. Perhaps the job 'finishes' when there's
some orted timeout, but no error is reported.
Any suggestions for more troubleshooting?
Please see below for output from a test job.
Thanks,
Mark
----------------------------------------------
Command as submitted via qsub:
mpirun --verbose \
--display-map \
--tag-output \
--debug-daemons \
--display-allocation \
--mca orte_forward_job_control 1 \
--mca pls_gridengine_verbose 1 \
--mca pls_gridengine_debug 1 \
--mca OMPI_MCA_mca_verbose 1 \
--mca btl_base_verbose 30 \
--mca routed direct \
--prefix $OPENMPI -np $NSLOTS ~/hello_openmpi
----- STDOUT from ~/hello_openmpi below this line -----
Command: ~/hello_openmpi
Arguments:
Executing in: /acme/home/bergman/sge_job_output
Executing on: acme-c5-8.example.com
Executing at: Thu Dec 20 17:00:46 EST 2012
----- STDERR from ~/hello_openmpi below this line -----
====================== ALLOCATED NODES ======================
Data for node: Name: acme-c5-8.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-9.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-10.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-11.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-12.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-13.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-14.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-15.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-16.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-17.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-18.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-19.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c5-20.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c4-9.example.com Num slots: 1 Max slots: 0
Data for node: Name: acme-c4-11.example.com Num slots: 1 Max slots: 0
=================================================================
======================== JOB MAP ========================
Data for node: Name: acme-c5-8.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 0
Data for node: Name: acme-c5-9.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 1
Data for node: Name: acme-c5-10.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 2
Data for node: Name: acme-c5-11.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 3
Data for node: Name: acme-c5-12.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 4
Data for node: Name: acme-c5-13.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 5
Data for node: Name: acme-c5-14.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 6
Data for node: Name: acme-c5-15.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 7
Data for node: Name: acme-c5-16.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 8
Data for node: Name: acme-c5-17.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 9
Data for node: Name: acme-c5-18.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 10
Data for node: Name: acme-c5-19.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 11
Data for node: Name: acme-c5-20.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 12
Data for node: Name: acme-c4-9.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 13
Data for node: Name: acme-c4-11.example.com Num procs: 1
Process OMPI jobid: [58179,1] Process rank: 14
=============================================================
Daemon was launched on acme-c5-10.example.com - beginning to initialize
Daemon [[58179,0],2] checking in as pid 9734 on host acme-c5-10.example.com
Daemon [[58179,0],2] not using static ports
[acme-c5-10.example.com:09734] [[58179,0],2] orted: up and running - waiting
for commands!
Daemon was launched on acme-c5-20.example.com - beginning to initialize
Daemon was launched on acme-c5-9.example.com - beginning to initialize
Daemon [[58179,0],12] checking in as pid 7292 on host acme-c5-20.example.com
Daemon [[58179,0],12] not using static ports
Daemon [[58179,0],1] checking in as pid 31954 on host acme-c5-9.example.com
[acme-c5-20.example.com:07292] [[58179,0],12] orted: up and running - waiting
for commands!
Daemon [[58179,0],1] not using static ports
[acme-c5-9.example.com:31954] [[58179,0],1] orted: up and running - waiting for
commands!
Daemon was launched on acme-c4-11.example.com - beginning to initialize
Daemon was launched on acme-c5-12.example.com - beginning to initialize
Daemon was launched on acme-c5-11.example.com - beginning to initialize
Daemon [[58179,0],14] checking in as pid 13717 on host acme-c4-11.example.com
Daemon [[58179,0],14] not using static ports
[acme-c4-11.example.com:13717] [[58179,0],14] orted: up and running - waiting
for commands!
Daemon [[58179,0],4] checking in as pid 1010 on host acme-c5-12.example.com
Daemon [[58179,0],4] not using static ports
Daemon was launched on acme-c5-15.example.com - beginning to initialize
[acme-c5-12.example.com:01010] [[58179,0],4] orted: up and running - waiting
for commands!
Daemon was launched on acme-c4-9.example.com - beginning to initialize
Daemon [[58179,0],3] checking in as pid 6876 on host acme-c5-11.example.com
Daemon [[58179,0],3] not using static ports
[acme-c5-11.example.com:06876] [[58179,0],3] orted: up and running - waiting
for commands!
Daemon was launched on acme-c5-16.example.com - beginning to initialize
Daemon [[58179,0],7] checking in as pid 7819 on host acme-c5-15.example.com
Daemon [[58179,0],7] not using static ports
[acme-c5-15.example.com:07819] [[58179,0],7] orted: up and running - waiting
for commands!
Daemon was launched on acme-c5-17.example.com - beginning to initialize
Daemon was launched on acme-c5-18.example.com - beginning to initialize
Daemon [[58179,0],13] checking in as pid 28397 on host acme-c4-9.example.com
Daemon [[58179,0],13] not using static ports
[acme-c4-9.example.com:28397] [[58179,0],13] orted: up and running - waiting
for commands!
Daemon was launched on acme-c5-19.example.com - beginning to initialize
Daemon [[58179,0],8] checking in as pid 21432 on host acme-c5-16.example.com
Daemon [[58179,0],8] not using static ports
[acme-c5-16.example.com:21432] [[58179,0],8] orted: up and running - waiting
for commands!
Daemon was launched on acme-c5-14.example.com - beginning to initialize
Daemon [[58179,0],9] checking in as pid 26411 on host acme-c5-17.example.com
Daemon [[58179,0],9] not using static ports
[acme-c5-17.example.com:26411] [[58179,0],9] orted: up and running - waiting
for commands!
Daemon [[58179,0],10] checking in as pid 11348 on host acme-c5-18.example.com
Daemon [[58179,0],10] not using static ports
[acme-c5-18.example.com:11348] [[58179,0],10] orted: up and running - waiting
for commands!
Daemon was launched on acme-c5-13.example.com - beginning to initialize
Daemon [[58179,0],11] checking in as pid 18318 on host acme-c5-19.example.com
Daemon [[58179,0],11] not using static ports
[acme-c5-19.example.com:18318] [[58179,0],11] orted: up and running - waiting
for commands!
Daemon [[58179,0],6] checking in as pid 3987 on host acme-c5-14.example.com
Daemon [[58179,0],6] not using static ports
[acme-c5-14.example.com:03987] [[58179,0],6] orted: up and running - waiting
for commands!
Daemon [[58179,0],5] checking in as pid 21829 on host acme-c5-13.example.com
Daemon [[58179,0],5] not using static ports
[acme-c5-13.example.com:21829] [[58179,0],5] orted: up and running - waiting
for commands!
[acme-c5-8.example.com:27764] [[58179,0],0] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[10].name acme-c5-18 daemon 10
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[11].name acme-c5-19 daemon 11
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[12].name acme-c5-20 daemon 12
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[13].name acme-c4-9 daemon 13
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] node[14].name acme-c4-11 daemon 14
arch ffca0200
[acme-c5-8.example.com:27764] [[58179,0],0] orted_cmd: received add_local_procs
[acme-c5-16.example.com:21432] [[58179,0],8] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[10].name acme-c5-18 daemon 10
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[11].name acme-c5-19 daemon 11
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[10].name acme-c5-18 daemon 10
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[11].name acme-c5-19 daemon 11
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[12].name acme-c5-20 daemon 12
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[10].name acme-c5-18 daemon 10
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[11].name acme-c5-19 daemon 11
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[10].name acme-c5-18 daemon
10 arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[11].name acme-c5-19 daemon
11 arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[12].name acme-c5-20 daemon
12 arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[13].name acme-c4-9 daemon 13
arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] node[14].name acme-c4-11 daemon
14 arch ffca0200
[acme-c5-18.example.com:11348] [[58179,0],10] orted_cmd: received
add_local_procs
[acme-c5-10.example.com:09734] [[58179,0],2] node[12].name acme-c5-20 daemon 12
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[13].name acme-c4-9 daemon 13
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[13].name acme-c4-9 daemon 13
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] node[14].name acme-c4-11 daemon 14
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[12].name acme-c5-20 daemon 12
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[13].name acme-c4-9 daemon 13
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] node[14].name acme-c4-11 daemon 14
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] node[14].name acme-c4-11 daemon 14
arch ffca0200
[acme-c5-10.example.com:09734] [[58179,0],2] orted_cmd: received add_local_procs
[acme-c5-9.example.com:31954] [[58179,0],1] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-11.example.com:06876] [[58179,0],3] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c4-11.example.com:13717] [[58179,0],14] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c4-11.example.com:13717] [[58179,0],14] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-14.example.com:03987] [[58179,0],6] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-12.example.com:01010] [[58179,0],4] orted_cmd: received add_local_procs
[acme-c5-13.example.com:21829] [[58179,0],5] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-13.example.com:21829] [[58179,0],5] node[1].name acme-c5-9 daemon 1
arch ffca0200
[acme-c5-16.example.com:21432] [[58179,0],8] orted_cmd: received add_local_procs
[acme-c5-20.example.com:07292] [[58179,0],12] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[8].name acme-c5-16 daemon 8
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[10].name acme-c5-18 daemon
10 arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[11].name acme-c5-19 daemon
11 arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[12].name acme-c5-20 daemon
12 arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[13].name acme-c4-9 daemon 13
arch ffca0200
[acme-c5-20.example.com:07292] [[58179,0],12] node[14].name acme-c4-11 daemon
14 arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[2].name acme-c5-10 daemon 2
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[3].name acme-c5-11 daemon 3
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[4].name acme-c5-12 daemon 4
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[5].name acme-c5-13 daemon 5
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[6].name acme-c5-14 daemon 6
arch ffca0200
[acme-c5-17.example.com:26411] [[58179,0],9] node[7].name acme-c5-15 daemon 7
arch ffca0200
[acme-c5-9.example.com:31954] [[58179,0],1] node[9].name acme-c5-17 daemon 9
arch ffca0200
[acme-c5-15.example.com:07819] [[58179,0],7] node[0].name acme-c5-8 daemon 0
arch ffca0200
[acme-c5-19.example.com:18318] [[58179,0],11]
node[[acme-c4-9.example.com:28397] [[58179,0],13]
node[0].name acme-c5-8 daemon 0 arch ffca0200
[acme-c4-9.example.com:28397] [[58179,0],13] node[1].name acme-c5-9 daemon 1
arch ffca0200
----------------------------------------------
-----
Mark Bergman
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users