>
> Message: 4
> Date: Mon, 30 Jun 2014 11:53:12 +0200
> From: Txema Heredia <txema.llis...@gmail.com>
> To: Derrick Lin <klin...@gmail.com>, SGE Mailing List
>         <users@gridengine.org>
> Subject: Re: [gridengine users] Enforce users to use specific amount
>         of      memory/slot
> Message-ID: <53b13388.5060...@gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> Hi Derrick,
>
> You could either set h_vmem as a consumable (consumable=yes) attribute
> and set a default value of 8GB for it. This way, whenever a job doesn't
> request any amount of h_vmem, it will automatically request 8GB per
> slot. This will affect all types of jobs.
>
> You could also define a JSV script that checks the username, and forces
> a -l h_vmem=8G for his/her jobs (
> jsv_sub_add_param('l_hard','h_vmem','8G') ). This will affect all jobs
> for that user, but could turn into a pain to manage.
>
> Or, you could set a different policy and allow all users to request the
> amount of memory they really need, trying to fit best the node. What is
> the point of forcing the user to reserve 63 additional cores when they
> only need 1 core and 500GB of memory? You could fit in that node one job
> like this, and, say, two 30-core-6GB-memory jobs.
>
> Txema
>
>
>
> El 30/06/14 08:55, Derrick Lin escribi?:
> > Hi guys,
> >
> > A typical node on our cluster has 64 cores and 512GB memory. So it's
> > about 8GB/core. Occasionally, we have some jobs that utilizes only 1
> > core but 400-500GB of memory, that annoys lots of users. So I am
> > seeking a way that can force jobs to run strictly below 8GB/core
> > ration or it should be killed.
> >
> > For example, the above job should ask for 64 cores in order to use
> > 500GB of memory (we have user quota for slots).
> >
> > I have been trying to play around h_vmem, set it to consumable and
> > configure RQS
> >
> > {
> >         name    max_user_vmem
> >         enabled true
> >         description     "Each user can utilize more than 8GB/slot"
> >         limit   users {bad_user} to h_vmem=8g
> > }
> >
> > but it seems to be setting a total vmem bad_user can use per job.
> >
> > I would love to set it on users instead of queue or hosts because we
> > have applications that utilize the same set of nodes and app should be
> > unlimited.
> >
> > Thanks
> > Derrick
>

I've been dealing with this too. I'm using h_vmem to kill processes that go
above the limit, and s_vmem set slightly lower by default to give
well-behaved processes a chance first to exit gracefully.

The issue is that these use virtual memory, which is (always, more or less)
great than resident memory, i.e. the actual ram usage. And with java apps
like Matlab, the amount of virtual memory reserved/used is HUGE compared to
resident, by 10x give or take. So it makes it really impracticle actually.
However so far I've just set the default h_vmem and s_vmem values high
enough to accomadate jvm apps, and increased the per-host consumable
appropriately. We don't get fine-grained memory control, but it definitely
controls out-of-control users/procs that otherwise might gobble up enough
ram to slow dow the entire node.

We may switch to UVE just for this reason, to get memory limits based on
resident memory, if it seems worth it enough in the end.

-M
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to