Hi,

I have found resource quotas to be pretty unreliable once you start making even 
moderately complex rules.  Instead I do almost everything through 
resource/host/queue configuration. Using a server side JSV you can do pretty 
much anything without needing resource quotas.

The way I have it setup is to have various time limits (30 minutes, 1 hour, 2 
hours, ..., over 48 hours). The jobs with the shortest time limit can use 100% 
of the cluster resources. Jobs with longer time limits can only use a fraction 
of cluster resources (e.g. 90% for 1 hour jobs).  I don't differentiate between 
interactive/batch jobs since people can always use an "interactive" session to 
do non-interactive processing (if all the batch slots are full).  I use the 
fair share system to balance usage between users.  Finally I provide an 
"urgent.q" which has a higher priority on which each user can use at most one 
slot for 12 hours.

Brendan Moloney
Research Associate
Advanced Imaging Research Center
Oregon Health Science University
________________________________
From: users-boun...@gridengine.org [users-boun...@gridengine.org] on behalf of 
Prentice Bisbal [prentice.bis...@rutgers.edu]
Sent: Friday, January 16, 2015 12:56 PM
To: users@gridengine.org
Subject: Re: [gridengine users] suggestions on setting up queues

Stephen,

I'd be careful about setting up too many queues. The more complicated you make 
it,the harder it is for your users to use. I'd start with the following. My 
apologies if you've already done some of these steps:

0. Find some way to monitor your scheduler's behavior, and figure out what you 
want to see happen. Without some kind of goals and metrics, how will you know 
if your changes are working as desired?

1. Require users specify a wallclock time when running jobs. This is required 
for step 2. Don't set a default wallclock time. Configure SGE to fail a job 
immediately if a wallclock time isn't specified. I did this a long time ago, 
but forgot how to this. I believe if you make '-w e' a default option for qsub 
(eg 'qsub -w e ......') jobs that do not specify h_rt will fail immediately. 
This will get your users to remember to always set h_rt.

2. Turn on backfill scheduling.

3. Look into fairshare scheduling.

Only after you've take these 3 steps, would I look into making additional 
queues.

Prentice

On 01/16/2015 02:50 PM, Stephen Spencer wrote:
Good morning.

With the number of users on our clusters growing, it's becoming less realistic 
to say "play fair 'cause you're not the only user of the cluster."

I'm looking for suggestions on setting up queues, both the "why" and "how," 
that will allow more of our users access to the cluster.

What I'm thinking of is a multi-queue approach:

  *   some limited number of "interactive" slots (and they'd be time-limited)
  *   a queue for jobs with short time duration - the "express" queue
  *   a queue for jobs that will run longer... but only so many of these per 
user

Any and all suggestions are welcome.

Thank you!

Best,
--
Stephen Spencer
spen...@cs.washington.edu<mailto:spen...@cs.washington.edu>



_______________________________________________
users mailing list
users@gridengine.org<mailto:users@gridengine.org>
https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to