While the original idea was to use some workflow for low core count jobs in a SLURM cluster, it ended up with a setup of a Virtual Cluster in (possibly) any queuing system. Although it might depend on the site policy to allow and use such a set up, it's at least a working scenario and might add features to any actual installation, which are not available or not set up. On the other hand this provides some kind of micro-scheduling inside the given allocation which is not available otherwise.
We got access to a SLURM equipped cluster where one always get complete nodes
and are asked to avoid single serial jobs or to pack them by scripting to fill
the nodes. With the additional need for a workflow application (kinda DRMAA)
and array job dependencies, I got the idea to run a GridEngine instance as a
Virtual Cluster in a SLURM cluster to solve this.
Essentially it's quite easy, as GridEngine offers:
- one can start SGE as normal user (for a single user setup per Virtual Cluster
it's exactly appropriate)
- SGE supports independent configurations, i.e. each Virtual Cluster is an
SGE_CELL
- configuration files can be plain text files (classic), and hence are easily
adjustable
After an untar of SGE somewhere like
/YOUR/PATH/HERE/VC-common-installation/opt/sge (no need to install anything
here), we need a planchet of a "classic" configuration put there named
"__SGE_PLANCHET__", and like the /tmp directory everyone should be able to
write at this level besides the "__SGE_PLANCHET__" (`chmod go=rwx,+t
/YOUR/PATH/HERE/VC-common-installation/opt/sge`). To the planchet you can add
items as needed, e.g. more PEs, complexes, queues,…
The enclosed script `multi-spawn.sh` gives an idea what has to be done then to
start a virtual cluster, even several ones per user, i.e.:
$ sbatch multi-spawn.sh
Regarding DRMAA one doesn't need to run this on the login node or a dedicated
job, instead the workflow application is already part of the (SLURM) job itself
(to be put in the application section in `multi-spawn.sh`).
===
While the planchet was created still with 6.2u5, there are only a few steps
necessary to create one for your version of SGE:
Run each install_* for qmaster and execd once. Essentially this will create
only a configuration and choose "classic" for the spooling method (no need to
add any exechost when you are asked for, in fact: remove the one which was
added afterwards, and in the @allhosts hostgroup too). Then rename this created
"default" configuration to "__SGE_PLANCHET__" and look in my planchet with
`grep` for entries like __FOO__ (i.e. strings enclosed by a double underscore).
These have to be replaced then there accordingly. The `multi-spawn.sh` will
then change these in a copy of the planchet to the names and location of the
actual SGE instance; i.e. each SGE_CELL has also its own spool directory.
Notably it's in sgemaster and sgeexecd:
SGE_ROOT=/usr/sge; export SGE_ROOT
SGE_CELL=default; export SGE_CELL
to:
SGE_ROOT=__SGE_INSTALLATION__; export SGE_ROOT
SGE_CELL=__SGE_CELL__; export SGE_CELL
===
You might need passphraseless `ssh` between the nodes, unless you start remote
daemons by `srun`. If this is not working too, a pseudo MPI application whose
only duty is to start the sgeexecd on each involved node should do.
===
In case you want to login to one of the nodes which were granted for your
Virtual Cluster interactively, you need to:
$ source
/YOUR/PATH/HERE/VC-common-installation/opt/sge/SGE_<SLURM_JOB_ID>/common/settings.sh
there to gain access to the SGE commands in the interactive shell for this
particular Virtual Cluster. Therefore two mini functions `sge-set
<SLURM_JOB_ID>` and `sge-done` are included to ease this.
While this works on the nodes instantly, it's necessary to add the head node(s)
of the SLURM cluster in the planchet beforehand as submit and/or admin hosts.
===
In case one wants to send emails, note that the default for GridEngine is the
account of the login node, which is in this case an exechost for SLURM. Either
a special set up there is necessary to receive email on an exechost, or provide
always an absolute eMail address with the option "-M" to GridEngine.
===
As every VC starts with job id 1, it might be helpful to create scratch
directories (in a global prolog/epilog) consisting of
"${SLURM_JOB_ID}_$(basename ${TMPDIR})". If you are getting always full nodes,
you won't have this problem on a local scratch directory for $TMPDIR though.
===
BTW: did I mention it: no need to be root anywhere.
-- Reuti
multi-spawn.sh
Description: Binary data
__SGE_PLANCHET__.tgz
Description: Binary data
cluster.tgz
Description: Binary data
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
