On Tue, 23 Jun 2015 15:48:57 +0000 Erik Soyez <[email protected]> wrote:
> Hallo William, > > many thanks for your quick reply! Okay, I need to specify: > > From the scheduler's point of view the jobs are identical. They have > no "-l" resource requirements. And yes, "load" is the only criteria > which restricts access to a node. No nodes in alarm state. Any > other ideas? I'm almost sure it has to do with the > "job_load_adjustment", I just cannot prove it yet.... :-) > I would have thought that if job_load_adjustment were the culprit it would be by means of putting nodes into alarm via artificial load (I don't use it myself). Otherwise AFAICT its only effect is on which available node a job goes to. Do you have JC_FILTER set in the sched_conf params? It is explicitly documented as buggy in a way that sometimes causes jobs not to be scheduled when they should be. I'd try running qalter -w p on one of the jobs and see if it gives any clues. If that doesn't reveal the problem turn on schedd_job_info and then after the next scheduling cycle run qstat -j against one of the jobs and you should see a record of the schedulers attempt's to schedule the job. Depending on your grid engine version schedd_job_info may cause a memory leak https://arc.liv.ac.uk/trac/SGE/ticket/682 so you may want to switch it off afterwards.
pgpYBknYY0yYX.pgp
Description: OpenPGP digital signature
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
