[slurm-dev] Re:Best Way to Schedule Jobs based on predetermined Lists

maviko . wagner Wed, 05 Apr 2017 06:58:47 -0700


Hello Dani and Thomas,

thank you both for your responses. They seem fit for a regular-sizehpc-cluster.

However i think i did not specify clear enough what my cluster lookslike and what i'm trying to achieve.Compared to a regular HPC-Cluster my testing cluster consists of aslittle as 5 nodes (each having the same "grandscale"-features, so noIB-nodes etc., and only differ in hardware details like cpu, #RAM etc.,including some MCUs akin to a raspberry pi).The purpose of this cluster is to investigate how smart distribution ofworkloads based on predetermined performance and energy data can benefithpc-clusters that consist of heterogenous systems that differ greatlyregarding energy consumption and performance.

Its just a small research project.

I already have all the data i need and now just need to find a way tointegrate node selection based on these priority lists into Slurm.

My idea is to write a plugin that, on job submission to slurm, readsthose lists, makes a smart selection based on different criteria whichcurrently available node would be suited best, and forwards the job tothat note. Partition-Selection is not needed since i run all Nodes inone partition for easier usage.The only information my plugin needs to forward besides nodename is somesmall config params in the form of Environment Variables on the targetmachine.


So far i did those job requests manually via:
srun -w<Nodename> --export="<ev1>",... <cmd>

I would like to include functionality into slurm so upon a simple "srun<cmd>" it supplies node selection and matching envVars automatically.Based on my current knowledge of slurms architecture, a plugin (eitherselect, or schedule) seems to be the apparent fit for what i'm trying toachieve.

However, as stated in my first mail, i have not dabbled with plugindevelopment/editing yet and kindly ask for advice from someone moreexperienced with that if indeed i pursue the correct approach.Or if a frontend solution, albeit less elegant, would be both easier andbetter fitting for the purpose of this project.


Thanks again,

M. Wagner




2017-04-04 19:54 GMT+02:00 Thomas M. Payerle <paye...@umd.edu>:

You can define nodes with "features", and then at job submission time
require specific features.
E.g., if you had some nodes with ethernet only, some with QDRinfiniband,
and some with FDR ib, you could define the QDR and FDR nodes to have
the feature qdr or fdr, respectively.  Then, e.g.  a job needing QDR
infiniband
would request the qdr feature when running sbatch. You would probablywant
to set the WEIGHTS (as dani wrote) so that non-IB nodes are selected
preferentially
(i.e. when no qdr or fdr feature requested) so that non-IB jobs areless
likely
to consume all the IB nodes.
This is all or nothing --- i.e. a job requesting feature qdr will NOTrun
on non-qdr nodes.  It won't even run on fdr nodes (IIRC, you cannot ask
for either qdr or fdr) --- if sufficient qdr nodes are not available itjustwaits until they are. As opposed to the WEIGHTS suggestion, whichbasicallysays these nodes are preferred, but other nodes can be used if needed.(So
using WEIGHTS is useful for preferring non-IB nodes; if no ib feature
requested,
the non-IB nodes are preferred, but ib nodes can be used for such jobsifno non-IB nodes available. But jobs requesting ib features will skipover
the non-IB nodes despite the higher weight because lack the requested
feature).
You can have sets of disjoint features (e.g. an sse4 feature if onlysome ofyour nodes support SSE4), and jobs can e.g. request fdr AND sse4(assuming
you have nodes meeting those criteria).
I do not believe that there is anything builtin to have featuresselected
by QOS, although one could probably add a job_submit lua script which
adds features to the job request based on QOS.




On Tue, 4 Apr 2017, dani wrote:
AFAIK, sbatch and friends don't allow "preferences" when submittingjobs
to multiple "clusters/partitions". Theoretically you could define a
different QOS for each workload, but either that QOS would be validfor all
nodes/partitions, or it would be the same as submitting to a single
partition (group of nodes).

The other mechanism allowing some preferential treatment of nodes is
"WEIGHT", but this one is generic to all jobs - all jobs would preferthehigher wight nodes, and I don't know of a qos that can add weight tonodes,
to be assigned to both a job and a partition to bridge that gap.
I think building a qos extension would be the "best" generic solution,but
not necessarily  the easiest to implement.

On 04/04//2017 18:42, maviko.wag...@fau.de wrote:
Hello everyone,
I'm looking for advice regarding what would be the easiest way torealize
the following:
I'm running Slurm to manage a heterogenous Cluster featuring amultitude
of different Nodes with specific Features and Hardware Details.
I'm currently using Slurm 15.08.7 since some of my Nodes run prettyOld
Ubuntus (14.04 oldest).
I also know power and runtime levels for all possible Workloads that
could be submitted to my Cluster. This Data was created viapredetermined
tests.

Now i would like to utilize this data to chose the best (based on
different criterias) available target node for each workload upon
submission.
What would be the easiest way to implement said behaviour?
Would writing a custom Scheduler-Plugin be suitable? Or would it be
easier to implement a front-end running programm that receives inputsimilarto slurm and forwards commands based on some internal selectionalgorithms?
I have not looked closer into tweaking Slurm further than basic
slurm.conf edits and would like to receive advice on which approachseems
more suitable... And maybe starting points.

Thanks in advance,

M. Wagner
Tom Payerle
IT-ETI-EUS                              paye...@umd.edu
4254 Stadium Dr                         (301) 405-6135
University of Maryland
College Park, MD 20742-4111

[slurm-dev] Re:Best Way to Schedule Jobs based on predetermined Lists

Reply via email to