Quoting amjad syed <[email protected]>: > Hello, > > We are working on concept of "super node" which transparently connects > heterogeneous light weight compute nodes to storage and services subsystem. > The light weight compute nodes will be used exclusively for computational > purposes and no service daemons will be running on these light weight > compute nodes. > We have open source implementation of this product is hosted on github. > ( https://github.com/HPCLinks/Open-Vertex) > > So in terms of SLURM, the light weight compute nodes will not have slurmd > daemons running on it. The management node daemon (slurmctld) will only > communicate with "super node" daemon (slurmd). This slurmd daemon should be > able to get dynamic resource information from light weight compute nodes > attached to "super node" and pass that information to management node. We > are looking at maximum 10 light weight compute nodes attached to one > "super node". > > Can slurmd running on compute node manage remote resources (such as > memory) ?
SLURM uses the concept of "front-end nodes" to manage resources on Cray and IBM BlueGene systems and the model seems identical to your "super nodes". > What is the best way forward to integrate VERTEX with SLURM ? You would probably just need to write a SLURM plugin for this. To get started, see SLURM's select/cray and select/bluegene plugins for examples and the documentation here: http://www.schedmd.com/slurmdocs/selectplugins.html > Sincerely, > Amjad >
