Hi,

Am 22.11.2011 um 02:17 schrieb Julie Ashworth:

> Thanks Reuti and all for being so responsive on this list. 

you're welcome.


> I have (probably) a newbie question...
> 
> I manage a small compute cluster for a university. I configured 
> a simple user-based functional share policy, since we don't have 
> projects, nor interest in usage patterns.
> 
> This works great for scheduling.
> 
> I also use resource quotas (users, department, and slots) to 
> limit running jobs for a single user/department. For example:
> 
> $ qquota -u foo
> resource quota rule limit                filter
> --------------------------------------------------------------------------------
> max_slots_groups/1 slots=90/90          users @foogroup
> max_slots_users/3  slots=20/70          users foo
> max_slots_hosts/1  slots=12/12          hosts host1
> ...
> 
> there are about 120 slots defined.
> 
> The user base is relatively computer-illiterate. In some 
> cases, users are unaware that they're using SGE, because 
> their application is tightly integrated with SGE (e.g.
> FMRIB Software Library).
> 
> My configuration relies on relatively high-turnover of 
> running jobs. (If long jobs fill the queue, then submitted 
> jobs must wait.) 
> 
> In the perfect world, newly submitted jobs would get runtime,
> while the longest waiting jobs would get suspended, and wait
> for the cluster to become more idle.
> I consider over-subscribing the queues, and defining a 
> suspend_threshold. But how does SGE choose which jobs get
> suspended? 

They are just hanging around there, just stopped. No memory, consumable 
resource or diskspace will be freed.


> I understand the suspend 'action' can be defined with slotwise 
> preemption (subordinate queues). I prefer not to use separate 
> queues, but perhaps it's the only option?

Yes, only jobs in other queue instances can be suspended.

What might help: have a long running queue with "priority 19", and the standard 
queue with "priority 0". This will set the nice value of the jobs, and if you 
oversubscribe the node, you won't face any issues with suspended jobs. The 
short running jobs will be executed while getting more computing time per node.

Of course, this mean to check whether it's suitable in your environment 
regarding the memory.

-- Reuti


> Thanks in advance!
> Best,
> Julie
> 
> -- 
> Julie Ashworth <julie.ashwo...@berkeley.edu>
> http://www.neuro.berkeley.edu
> PGP Key ID: 0x17F013D2
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to