Am 21.12.2012 um 17:24 schrieb berg...@merctech.com: > We're running SGE 6.2u5, OpenMPI 1.3.3 compiled with SGE integration. > Our cluster has some AMD and some Intel-based servers, but all are > managed through the same kickstart build and the same cfengine > configuration. The only deliberate differences in the nodes are: > > ATLAS and BLAS libraries are optimized per-CPU type > > the AMD nodes have disk drives that correctly report SMART readings, > so the smartd monitoring process runs on those machines > > There are 3 PEs defined: > openmpi all nodes > openmpi-Intel Intel CPU nodes > openmpi-AMD AMD CPU nodes > > The only known differences in the PEs are: > the hostgroup assigned to the PE (all nodes, just Intel nodes, just > AMD nodes) > > the number of slots per PE > > the environment variable "ARCHPATH" is set to the directory where > libraries optimized per-architecture are stored > > the environment variable "ARCH" is set to the architecture (as in: > Intel-Nehalem) > > (The environment variables are set outside of SGE jobs, so they will exist > when > a job is launched via mpirun or submitted via qsub.)
Where will the variables be set in a Tight Integration on the slave nodes of the parallel job - i.e. whether the correct orted is started? > I can run MPI jobs using "mpirun" on the Intel, AMD, or mixed sets of > nodes, using a machines file. Running the same commands as an SGE job > fails on the AMD nodes. > > For example, running: > mpirun -np 50 -machinefile machines.AMD /bin/hostname succeeds > mpirun -np 50 -machinefile machines.AMD hello_world.mpi succeeds > mpirun -np 50 -machinefile machines.Intel /bin/hostname succeeds > mpirun -np 50 -machinefile machines.Intel hello_world.mpi succeeds > mpirun -np 50 -machinefile machines.mixed /bin/hostname succeeds > mpirun -np 50 -machinefile machines.mixed hello_world.mpi succeeds You started the jobs also on an exechost and not only on the head node? > qsub -pe openmpi-Intel 50 mpirun /bin/hostname succeeds > qsub -pe openmpi-Intel 50 mpirun hello_world.mpi succeeds > qsub -pe openmpi 50 mpirun /bin/hostname * FAILS > if AMD nodes are used * > qsub -pe openmpi 50 mpirun hello_world.mpi * FAILS > if AMD nodes are used * > qsub -pe openmpi-AMD 50 mpirun /bin/hostname * FAILS > * > qsub -pe openmpi-AMD 50 mpirun hello_world.mpi * FAILS > * Are there several entries in `qacct` for these jobs - they all exit with zero? -- Reuti > When I run the job on the openmpi-AMD PE with debugging statements, > I can see that it starts on a node, that the slave MPI processes are > dispatched. As expected, all processes are run only on AMD nodes. However, > there are no results and the job finishes without an error. It does > take longer (~minutes) for the job to finish than the jobs that work > correctly on the Intel nodes. Perhaps the job 'finishes' when there's > some orted timeout, but no error is reported. > > Any suggestions for more troubleshooting? > > Please see below for output from a test job. > > Thanks, > > Mark > > > ---------------------------------------------- > Command as submitted via qsub: > > mpirun --verbose \ > --display-map \ > --tag-output \ > --debug-daemons \ > --display-allocation \ > --mca orte_forward_job_control 1 \ > --mca pls_gridengine_verbose 1 \ > --mca pls_gridengine_debug 1 \ > --mca OMPI_MCA_mca_verbose 1 \ > --mca btl_base_verbose 30 \ > --mca routed direct \ > --prefix $OPENMPI -np $NSLOTS ~/hello_openmpi > > > ----- STDOUT from ~/hello_openmpi below this line ----- > Command: ~/hello_openmpi > Arguments: > Executing in: /acme/home/bergman/sge_job_output > Executing on: acme-c5-8.example.com > Executing at: Thu Dec 20 17:00:46 EST 2012 > ----- STDERR from ~/hello_openmpi below this line ----- > > ====================== ALLOCATED NODES ====================== > > Data for node: Name: acme-c5-8.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-9.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-10.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-11.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-12.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-13.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-14.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-15.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-16.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-17.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-18.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-19.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c5-20.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c4-9.example.com Num slots: 1 Max slots: 0 > Data for node: Name: acme-c4-11.example.com Num slots: 1 Max slots: 0 > > ================================================================= > > ======================== JOB MAP ======================== > > Data for node: Name: acme-c5-8.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 0 > > Data for node: Name: acme-c5-9.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 1 > > Data for node: Name: acme-c5-10.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 2 > > Data for node: Name: acme-c5-11.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 3 > > Data for node: Name: acme-c5-12.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 4 > > Data for node: Name: acme-c5-13.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 5 > > Data for node: Name: acme-c5-14.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 6 > > Data for node: Name: acme-c5-15.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 7 > > Data for node: Name: acme-c5-16.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 8 > > Data for node: Name: acme-c5-17.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 9 > > Data for node: Name: acme-c5-18.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 10 > > Data for node: Name: acme-c5-19.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 11 > > Data for node: Name: acme-c5-20.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 12 > > Data for node: Name: acme-c4-9.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 13 > > Data for node: Name: acme-c4-11.example.com Num procs: 1 > Process OMPI jobid: [58179,1] Process rank: 14 > > ============================================================= > Daemon was launched on acme-c5-10.example.com - beginning to initialize > Daemon [[58179,0],2] checking in as pid 9734 on host acme-c5-10.example.com > Daemon [[58179,0],2] not using static ports > [acme-c5-10.example.com:09734] [[58179,0],2] orted: up and running - waiting > for commands! > Daemon was launched on acme-c5-20.example.com - beginning to initialize > Daemon was launched on acme-c5-9.example.com - beginning to initialize > Daemon [[58179,0],12] checking in as pid 7292 on host acme-c5-20.example.com > Daemon [[58179,0],12] not using static ports > Daemon [[58179,0],1] checking in as pid 31954 on host acme-c5-9.example.com > [acme-c5-20.example.com:07292] [[58179,0],12] orted: up and running - waiting > for commands! > Daemon [[58179,0],1] not using static ports > [acme-c5-9.example.com:31954] [[58179,0],1] orted: up and running - waiting > for commands! > Daemon was launched on acme-c4-11.example.com - beginning to initialize > Daemon was launched on acme-c5-12.example.com - beginning to initialize > Daemon was launched on acme-c5-11.example.com - beginning to initialize > Daemon [[58179,0],14] checking in as pid 13717 on host acme-c4-11.example.com > Daemon [[58179,0],14] not using static ports > [acme-c4-11.example.com:13717] [[58179,0],14] orted: up and running - waiting > for commands! > Daemon [[58179,0],4] checking in as pid 1010 on host acme-c5-12.example.com > Daemon [[58179,0],4] not using static ports > Daemon was launched on acme-c5-15.example.com - beginning to initialize > [acme-c5-12.example.com:01010] [[58179,0],4] orted: up and running - waiting > for commands! > Daemon was launched on acme-c4-9.example.com - beginning to initialize > Daemon [[58179,0],3] checking in as pid 6876 on host acme-c5-11.example.com > Daemon [[58179,0],3] not using static ports > [acme-c5-11.example.com:06876] [[58179,0],3] orted: up and running - waiting > for commands! > Daemon was launched on acme-c5-16.example.com - beginning to initialize > Daemon [[58179,0],7] checking in as pid 7819 on host acme-c5-15.example.com > Daemon [[58179,0],7] not using static ports > [acme-c5-15.example.com:07819] [[58179,0],7] orted: up and running - waiting > for commands! > Daemon was launched on acme-c5-17.example.com - beginning to initialize > Daemon was launched on acme-c5-18.example.com - beginning to initialize > Daemon [[58179,0],13] checking in as pid 28397 on host acme-c4-9.example.com > Daemon [[58179,0],13] not using static ports > [acme-c4-9.example.com:28397] [[58179,0],13] orted: up and running - waiting > for commands! > Daemon was launched on acme-c5-19.example.com - beginning to initialize > Daemon [[58179,0],8] checking in as pid 21432 on host acme-c5-16.example.com > Daemon [[58179,0],8] not using static ports > [acme-c5-16.example.com:21432] [[58179,0],8] orted: up and running - waiting > for commands! > Daemon was launched on acme-c5-14.example.com - beginning to initialize > Daemon [[58179,0],9] checking in as pid 26411 on host acme-c5-17.example.com > Daemon [[58179,0],9] not using static ports > [acme-c5-17.example.com:26411] [[58179,0],9] orted: up and running - waiting > for commands! > Daemon [[58179,0],10] checking in as pid 11348 on host acme-c5-18.example.com > Daemon [[58179,0],10] not using static ports > [acme-c5-18.example.com:11348] [[58179,0],10] orted: up and running - waiting > for commands! > Daemon was launched on acme-c5-13.example.com - beginning to initialize > Daemon [[58179,0],11] checking in as pid 18318 on host acme-c5-19.example.com > Daemon [[58179,0],11] not using static ports > [acme-c5-19.example.com:18318] [[58179,0],11] orted: up and running - waiting > for commands! > Daemon [[58179,0],6] checking in as pid 3987 on host acme-c5-14.example.com > Daemon [[58179,0],6] not using static ports > [acme-c5-14.example.com:03987] [[58179,0],6] orted: up and running - waiting > for commands! > Daemon [[58179,0],5] checking in as pid 21829 on host acme-c5-13.example.com > Daemon [[58179,0],5] not using static ports > [acme-c5-13.example.com:21829] [[58179,0],5] orted: up and running - waiting > for commands! > [acme-c5-8.example.com:27764] [[58179,0],0] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[8].name acme-c5-16 daemon 8 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[9].name acme-c5-17 daemon 9 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[10].name acme-c5-18 daemon > 10 arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[11].name acme-c5-19 daemon > 11 arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[12].name acme-c5-20 daemon > 12 arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[13].name acme-c4-9 daemon 13 > arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] node[14].name acme-c4-11 daemon > 14 arch ffca0200 > [acme-c5-8.example.com:27764] [[58179,0],0] orted_cmd: received > add_local_procs > [acme-c5-16.example.com:21432] [[58179,0],8] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[8].name acme-c5-16 daemon 8 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[9].name acme-c5-17 daemon 9 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[10].name acme-c5-18 daemon > 10 arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[11].name acme-c5-19 daemon > 11 arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[8].name acme-c5-16 daemon 8 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[9].name acme-c5-17 daemon 9 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[10].name acme-c5-18 daemon > 10 arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[11].name acme-c5-19 daemon > 11 arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[12].name acme-c5-20 daemon > 12 arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[8].name acme-c5-16 daemon 8 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[9].name acme-c5-17 daemon 9 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[10].name acme-c5-18 daemon > 10 arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[11].name acme-c5-19 daemon > 11 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[2].name acme-c5-10 daemon > 2 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[3].name acme-c5-11 daemon > 3 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[4].name acme-c5-12 daemon > 4 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[5].name acme-c5-13 daemon > 5 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[6].name acme-c5-14 daemon > 6 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[7].name acme-c5-15 daemon > 7 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[8].name acme-c5-16 daemon > 8 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[9].name acme-c5-17 daemon > 9 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[10].name acme-c5-18 daemon > 10 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[11].name acme-c5-19 daemon > 11 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[12].name acme-c5-20 daemon > 12 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[13].name acme-c4-9 daemon > 13 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] node[14].name acme-c4-11 daemon > 14 arch ffca0200 > [acme-c5-18.example.com:11348] [[58179,0],10] orted_cmd: received > add_local_procs > [acme-c5-10.example.com:09734] [[58179,0],2] node[12].name acme-c5-20 daemon > 12 arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[13].name acme-c4-9 daemon > 13 arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[13].name acme-c4-9 daemon > 13 arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] node[14].name acme-c4-11 daemon > 14 arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[12].name acme-c5-20 daemon > 12 arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[13].name acme-c4-9 daemon > 13 arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] node[14].name acme-c4-11 daemon > 14 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[2].name acme-c5-10 daemon > 2 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[3].name acme-c5-11 daemon > 3 arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] node[14].name acme-c4-11 daemon > 14 arch ffca0200 > [acme-c5-10.example.com:09734] [[58179,0],2] orted_cmd: received > add_local_procs > [acme-c5-9.example.com:31954] [[58179,0],1] node[8].name acme-c5-16 daemon 8 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-11.example.com:06876] [[58179,0],3] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c4-11.example.com:13717] [[58179,0],14] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c4-11.example.com:13717] [[58179,0],14] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-14.example.com:03987] [[58179,0],6] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-12.example.com:01010] [[58179,0],4] orted_cmd: received > add_local_procs > [acme-c5-13.example.com:21829] [[58179,0],5] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-13.example.com:21829] [[58179,0],5] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > [acme-c5-16.example.com:21432] [[58179,0],8] orted_cmd: received > add_local_procs > [acme-c5-20.example.com:07292] [[58179,0],12] node[4].name acme-c5-12 daemon > 4 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[5].name acme-c5-13 daemon > 5 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[6].name acme-c5-14 daemon > 6 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[7].name acme-c5-15 daemon > 7 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[8].name acme-c5-16 daemon > 8 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[9].name acme-c5-17 daemon > 9 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[10].name acme-c5-18 daemon > 10 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[11].name acme-c5-19 daemon > 11 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[12].name acme-c5-20 daemon > 12 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[13].name acme-c4-9 daemon > 13 arch ffca0200 > [acme-c5-20.example.com:07292] [[58179,0],12] node[14].name acme-c4-11 daemon > 14 arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[2].name acme-c5-10 daemon 2 > arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[3].name acme-c5-11 daemon 3 > arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[4].name acme-c5-12 daemon 4 > arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[5].name acme-c5-13 daemon 5 > arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[6].name acme-c5-14 daemon 6 > arch ffca0200 > [acme-c5-17.example.com:26411] [[58179,0],9] node[7].name acme-c5-15 daemon 7 > arch ffca0200 > [acme-c5-9.example.com:31954] [[58179,0],1] node[9].name acme-c5-17 daemon 9 > arch ffca0200 > [acme-c5-15.example.com:07819] [[58179,0],7] node[0].name acme-c5-8 daemon 0 > arch ffca0200 > [acme-c5-19.example.com:18318] [[58179,0],11] > node[[acme-c4-9.example.com:28397] [[58179,0],13] > node[0].name acme-c5-8 daemon 0 arch ffca0200 > [acme-c4-9.example.com:28397] [[58179,0],13] node[1].name acme-c5-9 daemon 1 > arch ffca0200 > ---------------------------------------------- > > ----- > Mark Bergman > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users