All jobs got into the same problem again. It may be related to /var
filling up and slurm not being able to store any job informtaion in
./slurm.state. Once the /var is cleaned the jobs still report the
ReqNodeNotAvail even though there are plenty nodes available.
How can I get the ReqNodeNotAvail resolved? I tried to hold and release
the jobs with no success.
Any help is much appreciated
Thanks
Eva
On Wed, 7 Aug 2013, Eva Hocks wrote:
>
>
>
> All jobs on the partition which use --ntasks-per-node in the sbatch
> script are not scheduled any more. The log shows:
>
>
> [2013-08-07T12:07:32.016] cons_res: _can_job_run_on_node: 0 cpus on
> gpu-2-13(0), mem 0/245760
> [2013-08-07T12:07:32.016] cons_res: _can_job_run_on_node: 0 cpus on
> gpu-2-14(0), mem 0/245760
> [2013-08-07T12:07:32.016] cons_res: _can_job_run_on_node: 0 cpus on
> gpu-2-15(0), mem 0/245760
> [2013-08-07T12:07:32.016] cons_res: _can_job_run_on_node: 0 cpus on
> gpu-2-16(0), mem 0/245760
> [2013-08-07T12:07:32.016] cons_res: cr_job_test: test 0 fail: insufficient
> resources
> [2013-08-07T12:07:32.016] no job_resources info for job 24243
> [2013-08-07T12:07:32.016] _pick_best_nodes: job 24243 never runnable
>
>
> Those same jobs were running yesterday and there have been no changes to
> the slurm configuration. Also I restarted slurm on all nodes which
> didn't improve anything.
>
> Any job script which uses only --nodes=1 runs without a problem in hte
> same partiotn which rejects the --ntasks-per-node job.
>
>
> Any idea what the problem with --ntasks-per-node is?
>
> Thanks
> Eva
>
>