On Wed, Jun 29, 2011 at 09:01:11AM -0400, Vic wrote:

Can you alter your bsub -> qsub translater include a '-P <project>'
string?  That way you can use an RQS to throttle the the number of jobs
running at a time.

That's the sort of thing I'm thinking about - but unfortunately, we have
several users (who might be running several compilations each), so to use
that sort of throttling would require some nifty scripting to pick up a
new project per run; I'm not sure how to create that uniqueness yet. But
this is the sort of ugliness I'm toying with..., :-)

You can set limits per user as well via RQS.

Also, there is a 'max_u_jobs' setting (qconf -sconf), that also limits the
number of active (running) jobs a user can have at time.  This is a
seperate restriction from using RQS (you can use both at the same time,
and the lower number should apply).

Another advantage to using a project for these jobs is to get accounting
information, even if you don't use a resource quota to limit the number
of them running.  A project can also apply to the scheduling routines if
you use either functional shares or a share tree configration.



You also mentioned that this was "well into the iterator".  If you use
qsub, you essetially dump all of the jobs into the queue, and let SGE
deal with them when it can.  You could also use qrsh or "qsub -now y".
These will both block until the job is actually complete.

I can't easily change the invocation sequence; Quartus is separating the
jobs & dispatching each one to the grid. So if I let them all go without
dependency settings, a couple of runs will completely swamp the grid for
several hours, and no-one else can run anything. If I limit each user,
that user becomes idle as soon as he's committed a run to the grid (won't
be able to run anything else).

They can *submit* as many jobs as they want, in that they are sent to
SGE (up to the limits set by max_u_jobs, max_jobs, and other limits),
but only a certain number of them will run at a given time.

It isn't pretty, is it?

Something similar happens here at $day_job, where a single user can swamp the
cluster, given the chance.  We address this in several ways:

1) Set global user limits on the number of total slots they can use at a
time (we use about 60%)

2) Limit specific groups/projects further, as needed.

3) We encourage the users to submit "more, smaller" jobs to SGE when
possible.  This creates a higher job throughput, and a faster "churn"
rate, which leads to a more fair distribution of CPU cycles (we mostly
use functional shares for this).


Another option would be to push all of these Quartus jobs into a
specific queue, and make it subordinate to a different queue used for
non-quartus jobs (project restrictions can help here too).


--
Jesse Becker
NHGRI Linux support (Digicon Contractor)
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to