We did something like this in the past but from C.  However, modifying the features was painful if the user did any interesting syntax.

What we are doing now is using --extra for that purpose.  The nodes boot up with SLURMD_OPTIONS="--extra {\\\"os\\\":\\\"rhel9\\\"}" or similar.  Users can request --extra=os=rhel9 or whatever if they want to submit across OS versions for some weird reason.

Handling defaults is problematic because there is no way to set a default --extra for people.  We had some things working to set an environment variable on the nodes that gets passed by sbatch, et al. and then read it from the submit plugin.  We would then set the --extra in the job submit plugin.  The problem is that salloc and srun behave differently and you can't access the environment.

Instead, we are now looking up the alloc_node in the plugin and reading its `extra` directly.  Here's what the relevant parts look like:
static void _set_extra_from_alloc_node(job_desc_msg_t *job_desc)
{
        node_record_t *node_ptr = find_node_record(job_desc->alloc_node);
        char *default_str = "os=rhel7";

        if (node_ptr == NULL) {
                job_desc->extra = xstrdup(default_str);
                info("WARNING: _set_extra_from_alloc_node: node %s not found. Setting job to default '%s'", job_desc->alloc_node, default_str);
        } else {
                if (!xstrcmp(node_ptr->extra, "{\"os\":\"rhel7\"}")) {
                        job_desc->extra = xstrdup("os=rhel7");
                } else if (!xstrcmp(node_ptr->extra, "{\"os\":\"rhel9\"}")) {
                        job_desc->extra = xstrdup("os=rhel9");
                } else {
                        job_desc->extra = xstrdup(default_str);
                        info("WARNING: _set_extra_from_alloc_node: node %s returned extra of '%s' which did not match known values. Setting job to default '%s'", job_desc->alloc_node, node_ptr->extra, default_str);
                }
        }
}

...


        if (!job_desc->extra) {
                _set_extra_from_alloc_node(job_desc);
        }

I don't know if you can do it in lua.  The easiest way to do this would be if there was an environment variable for a default --extra, but there isn't currently.  I've been meaning to ask SchedMD about that but haven't done so yet.

By the way, the nice thing about --extra is that there's no juggling of features in config files.  Whatever OS it boots up in, that's what ends up in the extra field.  We have a script that populates the relevant file before Slurm boots.

On 6/14/24 12:33, Laura Hild via slurm-users wrote:
I wrote a job_submit.lua also.  It would append "&centos79" to the feature string unless the 
features already contained "el9," or if empty, set the features string to "centos79" without 
the ampersand.  I didn't hear from any users doing anything fancy enough with their feature string for the 
ampersand to cause a problem.




--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to