On Tue, 23 Jun 2015 15:48:57 +0000
Erik Soyez <[email protected]> wrote:

> Hallo William,
> 
> many thanks for your quick reply!  Okay, I need to specify:
> 
> From the scheduler's point of view the jobs are identical.  They have
> no "-l" resource requirements.  And yes, "load" is the only criteria
> which restricts access to a node.  No nodes in alarm state.  Any
> other ideas? I'm almost sure it has to do with the
> "job_load_adjustment", I just cannot prove it yet....  :-)
> 
I would have thought that if job_load_adjustment were the culprit it
would be by means of putting nodes into alarm via artificial load (I
don't use it myself).   Otherwise AFAICT its only effect is on which
available node a job goes to.

Do you have JC_FILTER set in the sched_conf params?  It is explicitly
documented as buggy in a way that sometimes causes jobs not to be
scheduled when they should be.

I'd try running qalter -w p on one of the jobs and see if it gives any
clues.  If that doesn't reveal the problem turn on schedd_job_info and
then after the next scheduling cycle run qstat -j against one of the
jobs and you should see a record of the schedulers attempt's to schedule
the job.

Depending on your grid engine version schedd_job_info may cause a
memory leak https://arc.liv.ac.uk/trac/SGE/ticket/682 so you may want
to switch it off afterwards.

 

Attachment: pgpYBknYY0yYX.pgp
Description: OpenPGP digital signature

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to