I don't get the problem here.

If a single core job (let's assume it cannot easily be parallelized)
consumes 400 - 500 GB of RAM, leaving only a little left over for
others to use, what's the issue. Any jobs launched will be limited by
how much RAM is available (assuming it is a consumable), and any job
that cannot run in whatver amount of RAM is left is either run on
another node, or queued up until a node with sufficient resources is
available. Forcing the user to use, say, 50 cores for a 400GB job,
even though it is single threaded, would have the same end result -
400GB is in use (and 50 cores are also "in use" even though 49 are
idle), and other jobs either run somewhere else, or queue up.

Ian

On Mon, Jun 30, 2014 at 12:01 PM, Michael Stauffer <mgsta...@gmail.com> wrote:
>> Message: 4
>> Date: Mon, 30 Jun 2014 11:53:12 +0200
>> From: Txema Heredia <txema.llis...@gmail.com>
>> To: Derrick Lin <klin...@gmail.com>, SGE Mailing List
>>         <users@gridengine.org>
>> Subject: Re: [gridengine users] Enforce users to use specific amount
>>         of      memory/slot
>> Message-ID: <53b13388.5060...@gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>
>>
>> Hi Derrick,
>>
>> You could either set h_vmem as a consumable (consumable=yes) attribute
>> and set a default value of 8GB for it. This way, whenever a job doesn't
>> request any amount of h_vmem, it will automatically request 8GB per
>> slot. This will affect all types of jobs.
>>
>> You could also define a JSV script that checks the username, and forces
>> a -l h_vmem=8G for his/her jobs (
>> jsv_sub_add_param('l_hard','h_vmem','8G') ). This will affect all jobs
>> for that user, but could turn into a pain to manage.
>>
>> Or, you could set a different policy and allow all users to request the
>> amount of memory they really need, trying to fit best the node. What is
>> the point of forcing the user to reserve 63 additional cores when they
>> only need 1 core and 500GB of memory? You could fit in that node one job
>> like this, and, say, two 30-core-6GB-memory jobs.
>>
>> Txema
>>
>>
>>
>> El 30/06/14 08:55, Derrick Lin escribi?:
>>
>> > Hi guys,
>> >
>> > A typical node on our cluster has 64 cores and 512GB memory. So it's
>> > about 8GB/core. Occasionally, we have some jobs that utilizes only 1
>> > core but 400-500GB of memory, that annoys lots of users. So I am
>> > seeking a way that can force jobs to run strictly below 8GB/core
>> > ration or it should be killed.
>> >
>> > For example, the above job should ask for 64 cores in order to use
>> > 500GB of memory (we have user quota for slots).
>> >
>> > I have been trying to play around h_vmem, set it to consumable and
>> > configure RQS
>> >
>> > {
>> >         name    max_user_vmem
>> >         enabled true
>> >         description     "Each user can utilize more than 8GB/slot"
>> >         limit   users {bad_user} to h_vmem=8g
>> > }
>> >
>> > but it seems to be setting a total vmem bad_user can use per job.
>> >
>> > I would love to set it on users instead of queue or hosts because we
>> > have applications that utilize the same set of nodes and app should be
>> > unlimited.
>> >
>> > Thanks
>> > Derrick
>
>
> I've been dealing with this too. I'm using h_vmem to kill processes that go
> above the limit, and s_vmem set slightly lower by default to give
> well-behaved processes a chance first to exit gracefully.
>
> The issue is that these use virtual memory, which is (always, more or less)
> great than resident memory, i.e. the actual ram usage. And with java apps
> like Matlab, the amount of virtual memory reserved/used is HUGE compared to
> resident, by 10x give or take. So it makes it really impracticle actually.
> However so far I've just set the default h_vmem and s_vmem values high
> enough to accomadate jvm apps, and increased the per-host consumable
> appropriately. We don't get fine-grained memory control, but it definitely
> controls out-of-control users/procs that otherwise might gobble up enough
> ram to slow dow the entire node.
>
> We may switch to UVE just for this reason, to get memory limits based on
> resident memory, if it seems worth it enough in the end.
>
> -M
>
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users
>



-- 
Ian Kaufman
Research Systems Administrator
UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to