Re: [gridengine users] m_mem_free and cgroups

Korzennik, Sylvain Fri, 07 Aug 2020 14:25:23 -0700

> I'm trying to limit memory usage ....

We use UGE and we/I have addressed high memory usage/reservation as follows:


   - we have different types of queues, a matrix of memory limits and time
   limits (the elapsed time limit is always 2x cpu limit on our system), you
   do this via the queue definitions
   - we add a consumable resource I called mem_res to reserve memory via
   the global complex, with a 1GB default reservation (qconf -sc/-Mc)
   - we initiate the amount of reservable memory for ea host to the host's
   avail memory via the host's global (qconf -se/-Me) - I have scripts to
   automate this
   - users who will need more that 1GB of memory will reserve it and the
   scheduler will keep track of this
   - we use the RQS to limit how much total reserved memory can a
   single user reserve (in specific queues, the same way we limit no of slots
   a single user can grab)

Now nothing prevents users from reserving more or less mem than they will
use - so I have tools to monitor this and warn them - the only thing UGE
does is impose the queue's limits (memory, cpu and elapsed time), there is
no dynamic limit on imposing the reserved memory (I wish there was).

You need a separate consumable, the mem_free is the instantaneous free mem
on ea host/node, so all you ask is for the scheduler to start a job on a
host/node that has right now that much free mem, a job running on that node
might use more later.... Nobody has a crystal ball to predict how much mem
a job will eventually consume - our biologists struggle all the time to get
this guestimate right. Some of their codes come w/ estimators, not all and
not always accurate.

Here are some UGE config snapshots - we use JOB not YES for mem_res b/c our
high memory queues are for serial or mthread PEs, not MPI, and typical
hi-mem jobs memory needs does not scale w/ the no of slots. If you're more
curious, check our cluster Wiki
<https://confluence.si.edu/display/HPC/High+Performance+Computing>and
its status
page <https://www.cfa.harvard.edu/~sylvain/hydra/> to see it in action and
how it looks from the users.

BTW sThC stands for short-time hi-cpu (serial, mpi, mthread and hybrid PEs)
sThM stands for short-time hi-mem (serial mthread PEs only) - we have a
short, medium, long, unlimited times for hi-cpu,m hi-mem and very-hi-mem
(some of our nodes have 2TB of mem) matrix of queues.

% qconf -sq sThC.q                 % qconf -sq sThM.q
pe_list               mpich,orte,\ pe_list               mthread
 mthread,h2,h4,h8,h12,h16,\
 h20,h24,h28,h32
[...]
complex_values        NONE         complex_values        use_himem=TRUE
[...]
s_rt                  14:00:00     s_rt                  14:00:00
h_rt                  14:15:00     h_rt                  14:15:00
[...]
s_cpu                 7:00:00      s_cpu                 7:00:00
h_cpu                 7:15:00      h_cpu                 7:15:00
[...]
s_data                6G           s_data                450G
h_data                6G           h_data                450G
[...]
s_rss                 6G           s_rss                 450G
h_rss                 6G           h_rss                 450G
s_vmem                6G           s_vmem                450G
h_vmem                6G           h_vmem                450G

% qconf -sc | egrep 'mem_res|himem'
mem_res               mres        MEMORY      <=    YES         JOB
 1G       0       YES    0.000000
use_himem             himem       BOOL        ==    FORCED      NO
FALSE    0       NO     0.000000

and you then user specify -l mres=10G,h_data=10G,h_vmem=10G,himem for a job
that will need 100G of memory set aside and will run in a hi-mem queue.

Hope this helps.

    Stay sane, safe and healthy, 6+ft away! Cheers,
      Sylvain
--

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] m_mem_free and cgroups

Reply via email to