On Wed, Jun 13, 2012 at 12:44 AM, Joseph A. Farran <[email protected]> wrote: > Hi Rayson. > > So for us newbies with OGE, is there a (hopefully easy) way of automatically > adding cgroups to OGE parallel environment so that it's all nice and > transparently integrated into OGE?
OGS/GE 2011.11 U1 will be the first release with cgroups, we mainly use it for process grouping - and this part of transparent to the user. Later releases, like GE 2011.11 update 2, will include features that the user can use to tune the cgroups integration behavior. In the current implementation (U1), when the freezer controller is available, then it is used for safe signaling. If the cpuacct controller is available, then it is used for CPU cycle accounting. And if the memory controller is available, then memory limit & memory usage is handled by this controller as well - as detection is done without user intervention, it *should* be transparent enough! :-) The most important part is grouping processes to jobs, which is the main function of the PDC in Grid Engine. As Ron Chen & I implemented almost half of the platform specific PDC code (AIX, HP-UX, OSX, and other BSD-like systems use PDC that are mostly based on the OSX implementation - FreeBSD, NetBSD, OpenBSD. We even wrote a PDC-implementation for Linux that does not require running the execd as root - and this one was contributed to the original dev list but Sun was not too interested in it, and thus we only deployed it on a few systems... - long story), we know the ugly bits in the PDC! We believe the original PDC is something that really needs an update, esp. now that the Linux kernel has cgroups that was developed for this purpose... So we can finally remove hacks used in Grid Engine, such as adding a GID to a job or needing to know the "ENABLE_ADDGRP_KILL" flag for proper job cleanup. Note that the "ENABLE_ADDGRP_KILL" parameter was added by Sun a long time ago, as (again!) Andy told us that it is not always safe to kill all processes that has the supplementary GID added by Grid Engine. (Note that we have worked with Andy for a long time, and further there was no reason that Sun wanted to screw up its own products... but the result is that in some cases processes are left behind and not properly cleaned up by Grid Engine.) Lastly, in case you didn't know, we have a blog entry for the Grid Engine cgroups integration: http://blogs.scalablelogic.com/2012/05/grid-engine-cgroups-integration.html Rayson > > Joseph > > On 6/12/2012 5:19 PM, Rayson Ho wrote: > > On Tue, Jun 12, 2012 at 8:10 PM, Joseph Farran <[email protected]> wrote: > > If you guys are that paranoid about PE suspension, how about adding an > on/off flag for this since the code is already there and let the admin pick? > > Hi Joseph, > I just want to understand the background a bit more, that's all... > Esp. now we have cgroups that can handle suspension much safer than > the old code (SIGSTOP). > Rayson > > Joseph > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
