I recently reconfigured our SGE (6.2u5) environment to better handle MPI jobs
on a heterogeneous cluster. This seems to have caused a problem with the
"threaded" (SMP) PE.

Our PEs are:

        qconf -spl
                make                    (unused)
                openmpi-AMD
                openmpi-Intel
                threaded


I'm using a JSV to allow users to request "-pe openmpi" and alter that
to "-pe openmpi-*". The two "openmpi-*" PEs are both assigned to the
"all.q", but only given a hostgroup with the appropriate servers. This
works fine for OpenMPI jobs.

The PE "threaded" is also assigned to the "all.q". That PE should consist of
all hosts in the queue.

        qconf -sq all.q | grep pe_list
                pe_list  threaded 
make,[@mpi-AMD=openmpi-AMD],[@mpi-Intel=openmpi-Intel]

However, jobs submitted with a request for "-pe threaded" are not run. SGE
claims that the PE is not assigned to any queue:

        qstat -j 5170487
                parallel environment:  threaded range: 4
                cannot run in queue "all.q@c5-10" because PE "threaded" is not 
in pe list
                cannot run in queue "all.q@c5-11" because PE "threaded" is not 
in pe list
                cannot run in queue "all.q@c5-12" because PE "threaded" is not 
in pe list


I've tried assiging a hostgroup (@batch, the same as the hostgroup
assigned to the all.q) to the "threaded" PE, but that puts the nodes
into the c(onfiguration ambiguous) state.

Any suggestions?

Thanks,

Mark
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to