Hi Carl,

There's a somewhat hidden interaction between GE and the kernel in this
case. If you have sufficient paging space, then the kernel will be happy to
page out suspended jobs, and I think will even prioritize paging out
sleeping tasks. UGE also has a m_mem_free_soft setting in cgroup_params
that you can set, which will allow tasks using mfree (cgroup memory
enforcement) to exceed their request as long as the node has sufficient
memory to accomodate it. This also depends on kernel support; when the OS
runs into memory pressure, it will force the containers exceeding their
soft memory limit back down to their request, which will page the tasks out
if space exists, or trigger the OOM killer if it does not.

As for restarting jobs, I'm not sure how the non-Univa derivations do this,
but UGE 8.3 has a number of tunables for ensuring that preempted jobs get
more priority and their original resources so that you don't have low
priority jobs languishing forever.

On Mon, Nov 02, 2015 at 09:27:17AM -0800, Carl G. Riches wrote:
> 
> This issue is not perfectly clear to me.  If there is sufficient local 
> disk space, will a suspended job on a node be swapped to local disk?  Will 
> it restart (un-suspend) on that node when sufficient resources are free?

-- 
-- Skylar Thompson ([email protected])
-- Genome Sciences Department, System Administrator
-- Foege Building S046, (206)-685-7354
-- University of Washington School of Medicine
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to