Am 21.12.2012 um 17:24 schrieb berg...@merctech.com:

> We're running SGE 6.2u5, OpenMPI 1.3.3 compiled with SGE integration.
> Our cluster has some AMD and some Intel-based servers, but all are
> managed through the same kickstart build and the same cfengine
> configuration. The only deliberate differences in the nodes are:
> 
>       ATLAS and BLAS libraries are optimized per-CPU type
> 
>       the AMD nodes have disk drives that correctly report SMART readings,
>       so the smartd monitoring process runs on those machines
> 
> There are 3 PEs defined:
>       openmpi         all nodes
>       openmpi-Intel   Intel CPU nodes
>       openmpi-AMD     AMD CPU nodes
> 
> The only known differences in the PEs are:
>       the hostgroup assigned to the PE (all nodes, just Intel nodes, just
>       AMD nodes)
> 
>       the number of slots per PE
> 
>       the environment variable "ARCHPATH" is set to the directory where
>       libraries optimized per-architecture are stored
> 
>       the environment variable "ARCH" is set to the architecture (as in:
>       Intel-Nehalem)
> 
> (The environment variables are set outside of SGE jobs, so they will exist 
> when
> a job is launched via mpirun or submitted via qsub.)

Where will the variables be set in a Tight Integration on the slave nodes of 
the parallel job - i.e. whether the correct orted is started?


> I can run MPI jobs using "mpirun" on the Intel, AMD, or mixed sets of
> nodes, using a machines file. Running the same commands as an SGE job
> fails on the AMD nodes.
> 
> For example, running:
>       mpirun -np 50 -machinefile machines.AMD /bin/hostname           succeeds
>       mpirun -np 50 -machinefile machines.AMD hello_world.mpi         succeeds
>       mpirun -np 50 -machinefile machines.Intel /bin/hostname         succeeds
>       mpirun -np 50 -machinefile machines.Intel hello_world.mpi       succeeds
>       mpirun -np 50 -machinefile machines.mixed /bin/hostname         succeeds
>       mpirun -np 50 -machinefile machines.mixed hello_world.mpi       succeeds

You started the jobs also on an exechost and not only on the head node?


>       qsub -pe openmpi-Intel 50 mpirun /bin/hostname                  succeeds
>       qsub -pe openmpi-Intel 50 mpirun hello_world.mpi                succeeds
>       qsub -pe openmpi 50 mpirun /bin/hostname                        * FAILS 
> if AMD nodes are used *
>       qsub -pe openmpi 50 mpirun hello_world.mpi                      * FAILS 
> if AMD nodes are used *
>       qsub -pe openmpi-AMD 50 mpirun /bin/hostname                    * FAILS 
> *
>       qsub -pe openmpi-AMD 50 mpirun hello_world.mpi                  * FAILS 
> *

Are there several entries in `qacct` for these jobs - they all exit with zero?

-- Reuti


> When I run the job on the openmpi-AMD PE with debugging statements,
> I can see that it starts on a node, that the slave MPI processes are
> dispatched. As expected, all processes are run only on AMD nodes. However,
> there are no results and the job finishes without an error. It does
> take longer (~minutes) for the job to finish than the jobs that work
> correctly on the Intel nodes. Perhaps the job 'finishes' when there's
> some orted timeout, but no error is reported.
> 
> Any suggestions for more troubleshooting?
> 
> Please see below for output from a test job.
> 
> Thanks,
> 
> Mark
> 
> 
> ----------------------------------------------
> Command as submitted via qsub:
> 
> mpirun --verbose \
>       --display-map \
>       --tag-output  \
>       --debug-daemons \
>       --display-allocation \
>       --mca orte_forward_job_control 1 \
>       --mca pls_gridengine_verbose 1 \
>       --mca pls_gridengine_debug 1 \
>       --mca OMPI_MCA_mca_verbose 1 \
>       --mca btl_base_verbose 30 \
>       --mca routed direct \
>       --prefix $OPENMPI -np $NSLOTS ~/hello_openmpi
> 
> 
> ----- STDOUT from ~/hello_openmpi below this line -----
> Command: ~/hello_openmpi
> Arguments: 
> Executing in: /acme/home/bergman/sge_job_output
> Executing on: acme-c5-8.example.com
> Executing at: Thu Dec 20 17:00:46 EST 2012
> ----- STDERR from ~/hello_openmpi below this line -----
> 
> ======================   ALLOCATED NODES   ======================
> 
> Data for node: Name: acme-c5-8.example.com    Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-9.example.com    Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-10.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-11.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-12.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-13.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-14.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-15.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-16.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-17.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-18.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-19.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c5-20.example.com   Num slots: 1    Max slots: 0
> Data for node: Name: acme-c4-9.example.com    Num slots: 1    Max slots: 0
> Data for node: Name: acme-c4-11.example.com   Num slots: 1    Max slots: 0
> 
> =================================================================
> 
> ========================   JOB MAP   ========================
> 
> Data for node: Name: acme-c5-8.example.com    Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 0
> 
> Data for node: Name: acme-c5-9.example.com    Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 1
> 
> Data for node: Name: acme-c5-10.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 2
> 
> Data for node: Name: acme-c5-11.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 3
> 
> Data for node: Name: acme-c5-12.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 4
> 
> Data for node: Name: acme-c5-13.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 5
> 
> Data for node: Name: acme-c5-14.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 6
> 
> Data for node: Name: acme-c5-15.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 7
> 
> Data for node: Name: acme-c5-16.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 8
> 
> Data for node: Name: acme-c5-17.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 9
> 
> Data for node: Name: acme-c5-18.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 10
> 
> Data for node: Name: acme-c5-19.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 11
> 
> Data for node: Name: acme-c5-20.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 12
> 
> Data for node: Name: acme-c4-9.example.com    Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 13
> 
> Data for node: Name: acme-c4-11.example.com   Num procs: 1
>       Process OMPI jobid: [58179,1] Process rank: 14
> 
> =============================================================
> Daemon was launched on acme-c5-10.example.com - beginning to initialize
> Daemon [[58179,0],2] checking in as pid 9734 on host acme-c5-10.example.com
> Daemon [[58179,0],2] not using static ports
> [acme-c5-10.example.com:09734] [[58179,0],2] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c5-20.example.com - beginning to initialize
> Daemon was launched on acme-c5-9.example.com - beginning to initialize
> Daemon [[58179,0],12] checking in as pid 7292 on host acme-c5-20.example.com
> Daemon [[58179,0],12] not using static ports
> Daemon [[58179,0],1] checking in as pid 31954 on host acme-c5-9.example.com
> [acme-c5-20.example.com:07292] [[58179,0],12] orted: up and running - waiting 
> for commands!
> Daemon [[58179,0],1] not using static ports
> [acme-c5-9.example.com:31954] [[58179,0],1] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c4-11.example.com - beginning to initialize
> Daemon was launched on acme-c5-12.example.com - beginning to initialize
> Daemon was launched on acme-c5-11.example.com - beginning to initialize
> Daemon [[58179,0],14] checking in as pid 13717 on host acme-c4-11.example.com
> Daemon [[58179,0],14] not using static ports
> [acme-c4-11.example.com:13717] [[58179,0],14] orted: up and running - waiting 
> for commands!
> Daemon [[58179,0],4] checking in as pid 1010 on host acme-c5-12.example.com
> Daemon [[58179,0],4] not using static ports
> Daemon was launched on acme-c5-15.example.com - beginning to initialize
> [acme-c5-12.example.com:01010] [[58179,0],4] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c4-9.example.com - beginning to initialize
> Daemon [[58179,0],3] checking in as pid 6876 on host acme-c5-11.example.com
> Daemon [[58179,0],3] not using static ports
> [acme-c5-11.example.com:06876] [[58179,0],3] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c5-16.example.com - beginning to initialize
> Daemon [[58179,0],7] checking in as pid 7819 on host acme-c5-15.example.com
> Daemon [[58179,0],7] not using static ports
> [acme-c5-15.example.com:07819] [[58179,0],7] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c5-17.example.com - beginning to initialize
> Daemon was launched on acme-c5-18.example.com - beginning to initialize
> Daemon [[58179,0],13] checking in as pid 28397 on host acme-c4-9.example.com
> Daemon [[58179,0],13] not using static ports
> [acme-c4-9.example.com:28397] [[58179,0],13] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c5-19.example.com - beginning to initialize
> Daemon [[58179,0],8] checking in as pid 21432 on host acme-c5-16.example.com
> Daemon [[58179,0],8] not using static ports
> [acme-c5-16.example.com:21432] [[58179,0],8] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c5-14.example.com - beginning to initialize
> Daemon [[58179,0],9] checking in as pid 26411 on host acme-c5-17.example.com
> Daemon [[58179,0],9] not using static ports
> [acme-c5-17.example.com:26411] [[58179,0],9] orted: up and running - waiting 
> for commands!
> Daemon [[58179,0],10] checking in as pid 11348 on host acme-c5-18.example.com
> Daemon [[58179,0],10] not using static ports
> [acme-c5-18.example.com:11348] [[58179,0],10] orted: up and running - waiting 
> for commands!
> Daemon was launched on acme-c5-13.example.com - beginning to initialize
> Daemon [[58179,0],11] checking in as pid 18318 on host acme-c5-19.example.com
> Daemon [[58179,0],11] not using static ports
> [acme-c5-19.example.com:18318] [[58179,0],11] orted: up and running - waiting 
> for commands!
> Daemon [[58179,0],6] checking in as pid 3987 on host acme-c5-14.example.com
> Daemon [[58179,0],6] not using static ports
> [acme-c5-14.example.com:03987] [[58179,0],6] orted: up and running - waiting 
> for commands!
> Daemon [[58179,0],5] checking in as pid 21829 on host acme-c5-13.example.com
> Daemon [[58179,0],5] not using static ports
> [acme-c5-13.example.com:21829] [[58179,0],5] orted: up and running - waiting 
> for commands!
> [acme-c5-8.example.com:27764] [[58179,0],0] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[8].name acme-c5-16 daemon 8 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[9].name acme-c5-17 daemon 9 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[10].name acme-c5-18 daemon 
> 10 arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[11].name acme-c5-19 daemon 
> 11 arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[12].name acme-c5-20 daemon 
> 12 arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[13].name acme-c4-9 daemon 13 
> arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] node[14].name acme-c4-11 daemon 
> 14 arch ffca0200
> [acme-c5-8.example.com:27764] [[58179,0],0] orted_cmd: received 
> add_local_procs
> [acme-c5-16.example.com:21432] [[58179,0],8] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[8].name acme-c5-16 daemon 8 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[9].name acme-c5-17 daemon 9 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[10].name acme-c5-18 daemon 
> 10 arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[11].name acme-c5-19 daemon 
> 11 arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[8].name acme-c5-16 daemon 8 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[9].name acme-c5-17 daemon 9 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[10].name acme-c5-18 daemon 
> 10 arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[11].name acme-c5-19 daemon 
> 11 arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[12].name acme-c5-20 daemon 
> 12 arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[8].name acme-c5-16 daemon 8 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[9].name acme-c5-17 daemon 9 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[10].name acme-c5-18 daemon 
> 10 arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[11].name acme-c5-19 daemon 
> 11 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[2].name acme-c5-10 daemon 
> 2 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[3].name acme-c5-11 daemon 
> 3 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[4].name acme-c5-12 daemon 
> 4 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[5].name acme-c5-13 daemon 
> 5 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[6].name acme-c5-14 daemon 
> 6 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[7].name acme-c5-15 daemon 
> 7 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[8].name acme-c5-16 daemon 
> 8 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[9].name acme-c5-17 daemon 
> 9 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[10].name acme-c5-18 daemon 
> 10 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[11].name acme-c5-19 daemon 
> 11 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[12].name acme-c5-20 daemon 
> 12 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[13].name acme-c4-9 daemon 
> 13 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] node[14].name acme-c4-11 daemon 
> 14 arch ffca0200
> [acme-c5-18.example.com:11348] [[58179,0],10] orted_cmd: received 
> add_local_procs
> [acme-c5-10.example.com:09734] [[58179,0],2] node[12].name acme-c5-20 daemon 
> 12 arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[13].name acme-c4-9 daemon 
> 13 arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[13].name acme-c4-9 daemon 
> 13 arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] node[14].name acme-c4-11 daemon 
> 14 arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[12].name acme-c5-20 daemon 
> 12 arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[13].name acme-c4-9 daemon 
> 13 arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] node[14].name acme-c4-11 daemon 
> 14 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[2].name acme-c5-10 daemon 
> 2 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[3].name acme-c5-11 daemon 
> 3 arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] node[14].name acme-c4-11 daemon 
> 14 arch ffca0200
> [acme-c5-10.example.com:09734] [[58179,0],2] orted_cmd: received 
> add_local_procs
> [acme-c5-9.example.com:31954] [[58179,0],1] node[8].name acme-c5-16 daemon 8 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-11.example.com:06876] [[58179,0],3] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c4-11.example.com:13717] [[58179,0],14] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c4-11.example.com:13717] [[58179,0],14] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-14.example.com:03987] [[58179,0],6] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-12.example.com:01010] [[58179,0],4] orted_cmd: received 
> add_local_procs
> [acme-c5-13.example.com:21829] [[58179,0],5] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-13.example.com:21829] [[58179,0],5] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> [acme-c5-16.example.com:21432] [[58179,0],8] orted_cmd: received 
> add_local_procs
> [acme-c5-20.example.com:07292] [[58179,0],12] node[4].name acme-c5-12 daemon 
> 4 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[5].name acme-c5-13 daemon 
> 5 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[6].name acme-c5-14 daemon 
> 6 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[7].name acme-c5-15 daemon 
> 7 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[8].name acme-c5-16 daemon 
> 8 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[9].name acme-c5-17 daemon 
> 9 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[10].name acme-c5-18 daemon 
> 10 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[11].name acme-c5-19 daemon 
> 11 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[12].name acme-c5-20 daemon 
> 12 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[13].name acme-c4-9 daemon 
> 13 arch ffca0200
> [acme-c5-20.example.com:07292] [[58179,0],12] node[14].name acme-c4-11 daemon 
> 14 arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[2].name acme-c5-10 daemon 2 
> arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[3].name acme-c5-11 daemon 3 
> arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[4].name acme-c5-12 daemon 4 
> arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[5].name acme-c5-13 daemon 5 
> arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[6].name acme-c5-14 daemon 6 
> arch ffca0200
> [acme-c5-17.example.com:26411] [[58179,0],9] node[7].name acme-c5-15 daemon 7 
> arch ffca0200
> [acme-c5-9.example.com:31954] [[58179,0],1] node[9].name acme-c5-17 daemon 9 
> arch ffca0200
> [acme-c5-15.example.com:07819] [[58179,0],7] node[0].name acme-c5-8 daemon 0 
> arch ffca0200
> [acme-c5-19.example.com:18318] [[58179,0],11] 
> node[[acme-c4-9.example.com:28397] [[58179,0],13] 
> node[0].name acme-c5-8 daemon 0 arch ffca0200
> [acme-c4-9.example.com:28397] [[58179,0],13] node[1].name acme-c5-9 daemon 1 
> arch ffca0200
> ----------------------------------------------
> 
> -----
> Mark Bergman
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to