Hi guys,

Thanks for the replies. I think what really matters is per user RQS, currently 
we only set quota to be 192 slots per user, equivalents to 3 nodes. So the 
users can run 192 such big memory jobs and occupy 192 nodes.

So my original idea doesn't help to improve the resource utilisation but really 
just for preventing a user to use more than 3 entire nodes. 

Maybe there is some sorts of resource equivalency between slot and memory can 
achieve that?

Thanks
D

Sent from my iPad

> On 1 Jul 2014, at 5:57 am, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote:
> 
> I don't get the problem here.
> 
> If a single core job (let's assume it cannot easily be parallelized)
> consumes 400 - 500 GB of RAM, leaving only a little left over for
> others to use, what's the issue. Any jobs launched will be limited by
> how much RAM is available (assuming it is a consumable), and any job
> that cannot run in whatver amount of RAM is left is either run on
> another node, or queued up until a node with sufficient resources is
> available. Forcing the user to use, say, 50 cores for a 400GB job,
> even though it is single threaded, would have the same end result -
> 400GB is in use (and 50 cores are also "in use" even though 49 are
> idle), and other jobs either run somewhere else, or queue up.
> 
> Ian
> 
> On Mon, Jun 30, 2014 at 12:01 PM, Michael Stauffer <mgsta...@gmail.com> wrote:
>>> Message: 4
>>> Date: Mon, 30 Jun 2014 11:53:12 +0200
>>> From: Txema Heredia <txema.llis...@gmail.com>
>>> To: Derrick Lin <klin...@gmail.com>, SGE Mailing List
>>>        <users@gridengine.org>
>>> Subject: Re: [gridengine users] Enforce users to use specific amount
>>>        of      memory/slot
>>> Message-ID: <53b13388.5060...@gmail.com>
>>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>> 
>>> 
>>> Hi Derrick,
>>> 
>>> You could either set h_vmem as a consumable (consumable=yes) attribute
>>> and set a default value of 8GB for it. This way, whenever a job doesn't
>>> request any amount of h_vmem, it will automatically request 8GB per
>>> slot. This will affect all types of jobs.
>>> 
>>> You could also define a JSV script that checks the username, and forces
>>> a -l h_vmem=8G for his/her jobs (
>>> jsv_sub_add_param('l_hard','h_vmem','8G') ). This will affect all jobs
>>> for that user, but could turn into a pain to manage.
>>> 
>>> Or, you could set a different policy and allow all users to request the
>>> amount of memory they really need, trying to fit best the node. What is
>>> the point of forcing the user to reserve 63 additional cores when they
>>> only need 1 core and 500GB of memory? You could fit in that node one job
>>> like this, and, say, two 30-core-6GB-memory jobs.
>>> 
>>> Txema
>>> 
>>> 
>>> 
>>> El 30/06/14 08:55, Derrick Lin escribi?:
>>> 
>>>> Hi guys,
>>>> 
>>>> A typical node on our cluster has 64 cores and 512GB memory. So it's
>>>> about 8GB/core. Occasionally, we have some jobs that utilizes only 1
>>>> core but 400-500GB of memory, that annoys lots of users. So I am
>>>> seeking a way that can force jobs to run strictly below 8GB/core
>>>> ration or it should be killed.
>>>> 
>>>> For example, the above job should ask for 64 cores in order to use
>>>> 500GB of memory (we have user quota for slots).
>>>> 
>>>> I have been trying to play around h_vmem, set it to consumable and
>>>> configure RQS
>>>> 
>>>> {
>>>>        name    max_user_vmem
>>>>        enabled true
>>>>        description     "Each user can utilize more than 8GB/slot"
>>>>        limit   users {bad_user} to h_vmem=8g
>>>> }
>>>> 
>>>> but it seems to be setting a total vmem bad_user can use per job.
>>>> 
>>>> I would love to set it on users instead of queue or hosts because we
>>>> have applications that utilize the same set of nodes and app should be
>>>> unlimited.
>>>> 
>>>> Thanks
>>>> Derrick
>> 
>> 
>> I've been dealing with this too. I'm using h_vmem to kill processes that go
>> above the limit, and s_vmem set slightly lower by default to give
>> well-behaved processes a chance first to exit gracefully.
>> 
>> The issue is that these use virtual memory, which is (always, more or less)
>> great than resident memory, i.e. the actual ram usage. And with java apps
>> like Matlab, the amount of virtual memory reserved/used is HUGE compared to
>> resident, by 10x give or take. So it makes it really impracticle actually.
>> However so far I've just set the default h_vmem and s_vmem values high
>> enough to accomadate jvm apps, and increased the per-host consumable
>> appropriately. We don't get fine-grained memory control, but it definitely
>> controls out-of-control users/procs that otherwise might gobble up enough
>> ram to slow dow the entire node.
>> 
>> We may switch to UVE just for this reason, to get memory limits based on
>> resident memory, if it seems worth it enough in the end.
>> 
>> -M
>> 
>> _______________________________________________
>> users mailing list
>> users@gridengine.org
>> https://gridengine.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Ian Kaufman
> Research Systems Administrator
> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to