Hello, We are running a 10 node cluster in our lab and we are experiencing a job allocation lag.
srun commands wait for resource allocation up to 1 minute even if there are several idle nodes. It's the same with sbatch scripts. Even if there are idle nodes, jobs are waiting for about one minute for resource allocation.. Our ControlMachine is on a virtual node. Compute nodes are all physical machines. In our config file we set those values : FastSchedule=1 SchedulerType=sched/backfill I feel like after the whole cluster reboot, jobs are scheduled pretty fast and after few weeks uptime job scheduling slows down (at this moment ControlMAchine uptime is 25 days). I'm not quite sure those are related. Everything looks in order, there is no errors in logfiles ... I'll be grateful for any hint ... or advice. Thanks, Vladimir