Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen 
anything like this.

As the slurmctld restarts, after upping the debug level, it all look hunky dory,

[2020-12-17T09:23:46.204] debug3: Trying to load plugin 
/opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so
[2020-12-17T09:23:46.205] debug3: Success.
[2020-12-17T09:23:46.206] debug3: Trying to load plugin 
/opt/slurm/20.11.1/lib64/slurm/job_submit_lua.so
[2020-12-17T09:23:46.207] debug3: slurm_lua_loadscript: job_submit/lua: loading 
Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:23:46.208] debug3: Success.
[2020-12-17T09:23:46.209] debug3: Trying to load plugin 
/opt/slurm/20.11.1/lib64/slurm/prep_script.so
[2020-12-17T09:23:46.210] debug3: Success.

but, at the point a submiited job that should pass through the job_submit 
script,

[2020-12-17T09:26:06.806] debug3: job_submit/lua: slurm_lua_loadscript: 
skipping loading Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:26:06.807] debug3: assoc_mgr_fill_in_user: found correct user: 
someuser(12345)
[2020-12-17T09:26:06.808] debug5: assoc_mgr_fill_in_assoc: looking for assoc of 
user=someuser(12345), acct=accnts0001, cluster=clust, partition=acceptance
[2020-12-17T09:26:06.809] debug3: assoc_mgr_fill_in_assoc: found correct 
association of user=someuser(12345), acct=accnts0001, cluster=clust, 
partition=acceptance to assoc=67 acct=accnts0001


Reason I went looking is that the job_submit.lua should be telling
me, the job submitter, to "sling my hook" as I have, deliberately,
left something out.

FWIW, the debug level here goes all the way to 5, so I was hoping
for a little more info as to why it is skipping it.

The skip is occuring, in src/lua/slurm_lua.c, because of this trap

        if (st.st_mtime <= *load_time) {
                debug3("%s: %s: skipping loading Lua script: %s", plugin,
                       __func__, script_path);
                return SLURM_SUCCESS;
        }
        debug3("%s: %s: loading Lua script: %s", __func__, plugin, script_path);

where "st" is a stat struct, but I am currently none the wiser as why
such a condition would be (maybe even, would need to be) triggered?

The job submit script is certainly "younger" than the time of the slurmctld
restart, and of the job submission, be then, why wouldn't it be?

Kevin
--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre

Reply via email to