Hi guys,
A typical node on our cluster has 64 cores and 512GB memory. So it's about
8GB/core. Occasionally, we have some jobs that utilizes only 1 core but
400-500GB of memory, that annoys lots of users. So I am seeking a way that
can force jobs to run strictly below 8GB/core ration or it should be killed.
For example, the above job should ask for 64 cores in order to use 500GB of
memory (we have user quota for slots).
I have been trying to play around h_vmem, set it to consumable and
configure RQS
{
name max_user_vmem
enabled true
description "Each user can utilize more than 8GB/slot"
limit users {bad_user} to h_vmem=8g
}
but it seems to be setting a total vmem bad_user can use per job.
I would love to set it on users instead of queue or hosts because we have
applications that utilize the same set of nodes and app should be unlimited.
Thanks
Derrick
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users