We have similar computing policies (3rd strike and out) in place starting Fall 2011 semester but would love to know the technique of killing any/all user process that is not a child of sge_execd. Gives me something to learn about and use later, if need arises.
But I do agree about your other findings - even the most extensive manuals/user guides we have written have mostly gone in vain. We are starting to employ a polite version of RTFM policy as well - at least to those groups to whom the documentation & demonstration were given. Thanks for your time :) Best, g -- Gowtham Advanced IT Research Support Michigan Technological University (906) 487/3593 On Fri, 19 Aug 2011, Chris Dagdigian wrote: | I think I learned this trick from Reuti: | | - Any legit job running under Grid Engine will be a child process of an | sge_execd daemon. | | A nice little trick is a cronjob that does a "kill -9" on any user process | that is not a child of sge_execd -- that will quickly send a message to the | people bypassing the resource scheduling layer. | | That said, however, I've been in this position in a number of environments and | I can tell you that you will NEVER win the battle with users trying to game | the system. The motivated user will always have more time and more incentive | than an overworked cluster administrator. | | While simple technical measures like that "kill -9" trick or Reuti's more | sensible suggestion of blocking interactive SSH access to nodes outside of SGE | should be pursued I'd suggest that you don't spend much more time than that | developing technical countermeasures. | | The real way this gets solved in a multi-user cluster environment is by | treating acceptable cluster usage as a human resources policy. You'll never | win a technical battle with a motivated power user. | | Acceptable cluster use should be governed by a published policy and when the | policy is avoided or gamed then the response should involve mentors, managers | or the HR department, not technology or scripts. | | In a corporate setting this comes down to: | | 1. First time you bypass SGE the admins send you a warning | | 2. Second time you get caught your manager gets notified | | 3. Third time? Account is disabled and you are reported to the HR department | for violating company policy repeatedly | | Sorry for being long winded but most long-time cluster admins might share my | option that cluster use policies can't be treated as a technical war between | admins and users -- it's far easier and better to treat this as a workplace | behavior thing. | | -Chris | | | | | | | Reuti wrote: | > Hi, | > | > Am 19.08.2011 um 18:30 schrieb Gowtham: | > | > > In some of the computing clusters across our campus, we have noticed many | > > users running their jobs outside of the SGE queuing system. While we have | > > plans to continue tutoring them about the benefits of using a queuing | > > system, not everyone seems to be getting the message - as such, these | > > violating-users' jobs are hampering those who have been | > > using SGE. | > > | > > On all our Rocks based clusters, we do keep the list of | > > cluster's uses in a flat text file, one user per line. | > > | > > Is there a way by which I (as root) can kill all those | > > jobs submitted outside of SGE on compute nodes by these | > > normal users? | _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users