I recently reconfigured our SGE (6.2u5) environment to better handle MPI jobs
on a heterogeneous cluster. This seems to have caused a problem with the
"threaded" (SMP) PE.
Our PEs are:
qconf -spl
make (unused)
openmpi-AMD
openmpi-Intel
threaded
I'm using a JSV to allow users to request "-pe openmpi" and alter that
to "-pe openmpi-*". The two "openmpi-*" PEs are both assigned to the
"all.q", but only given a hostgroup with the appropriate servers. This
works fine for OpenMPI jobs.
The PE "threaded" is also assigned to the "all.q". That PE should consist of
all hosts in the queue.
qconf -sq all.q | grep pe_list
pe_list threaded
make,[@mpi-AMD=openmpi-AMD],[@mpi-Intel=openmpi-Intel]
However, jobs submitted with a request for "-pe threaded" are not run. SGE
claims that the PE is not assigned to any queue:
qstat -j 5170487
parallel environment: threaded range: 4
cannot run in queue "all.q@c5-10" because PE "threaded" is not
in pe list
cannot run in queue "all.q@c5-11" because PE "threaded" is not
in pe list
cannot run in queue "all.q@c5-12" because PE "threaded" is not
in pe list
I've tried assiging a hostgroup (@batch, the same as the hostgroup
assigned to the all.q) to the "threaded" PE, but that puts the nodes
into the c(onfiguration ambiguous) state.
Any suggestions?
Thanks,
Mark
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users