On Mon, Feb 6, 2017 at 10:17 AM, Hans-Nikolai Viessmann <h...@hw.ac.uk> wrote: > > I had just added the DebugFlags setting to slurm.conf on the head node > and did not sychronise it with the nodes. I doubt that this could cause the > problem I described as it was occuring before I made the change to > slurm.conf. > > One thing I did notice is this error occuring every once and a while: > > [2016-12-30T17:36:50.963] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu07 > [2016-12-30T17:36:50.963] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu04 > [2016-12-30T17:36:50.963] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu01 > [2016-12-30T17:36:50.963] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu05 > [2016-12-30T17:36:50.963] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu02 > [2016-12-30T17:36:50.964] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu06 > [2016-12-30T17:36:50.966] error: gres_plugin_node_config_unpack: gres/gpu > lacks File parameter for node gpu03 > > Is it possible that I need to specify the Gres Type for the other nodes as > well, even though that > have only one GPU each?
i'm not an expert, but i believe your gres.conf is incorrect. Ours looks like this name=hostname file=/dev/nvidia0 type=k10 i think the issue is that slurm is trying to match your hostname to the gres file to see what matches and can't