We have similar computing policies (3rd strike and out) in 
place starting Fall 2011 semester but would love to know the
technique of killing any/all user process that is not a 
child of sge_execd. Gives me something to learn about and 
use later, if need arises.

But I do agree about your other findings - even the most 
extensive manuals/user guides we have written have mostly
gone in vain. We are starting to employ a polite version of
RTFM policy as well - at least to those groups to whom the
documentation & demonstration were given.

Thanks for your time :)

Best,
g

--
Gowtham
Advanced IT Research Support
Michigan Technological University

(906) 487/3593


On Fri, 19 Aug 2011, Chris Dagdigian wrote:

| I think I learned this trick from Reuti:
| 
|  - Any legit job running under Grid Engine will be a child process of an
| sge_execd daemon.
| 
| A nice little trick is a cronjob that does a "kill -9" on any user process
| that is not a child of sge_execd -- that will quickly send a message to the
| people bypassing the resource scheduling layer.
| 
| That said, however, I've been in this position in a number of environments and
| I can tell you that you will NEVER win the battle with users trying to game
| the system. The motivated user will always have more time and more incentive
| than an overworked cluster administrator.
| 
| While simple technical measures like that "kill -9" trick or Reuti's more
| sensible suggestion of blocking interactive SSH access to nodes outside of SGE
| should be pursued I'd suggest that you don't spend much more time than that
| developing technical countermeasures.
| 
| The real way this gets solved in a multi-user cluster environment is by
| treating acceptable cluster usage as a human resources policy. You'll never
| win a technical battle with a motivated power user.
| 
| Acceptable cluster use should be governed by a published policy and when the
| policy is avoided or gamed then the response should involve mentors, managers
| or the HR department, not technology or scripts.
| 
| In a corporate setting this comes down to:
| 
| 1. First time you bypass SGE the admins send you a warning
| 
| 2. Second time you get caught your manager gets notified
| 
| 3. Third time? Account is disabled and you are reported to the HR department
| for violating company policy repeatedly
| 
| Sorry for being long winded but most long-time cluster admins might share my
| option that cluster use policies can't be treated as a technical war between
| admins and users -- it's far easier and better to treat this as a workplace
| behavior thing.
| 
| -Chris
| 
| 
| 
| 
| 
| 
| Reuti wrote:
| > Hi,
| > 
| > Am 19.08.2011 um 18:30 schrieb Gowtham:
| > 
| > > In some of the computing clusters across our campus, we have noticed many
| > > users running their jobs outside of the SGE queuing system. While we have
| > > plans to continue tutoring them about the benefits of using a queuing
| > > system, not everyone seems to be getting the message - as such, these
| > > violating-users' jobs are hampering those who have been
| > > using SGE.
| > > 
| > > On all our Rocks based clusters, we do keep the list of
| > > cluster's uses in a flat text file, one user per line.
| > > 
| > > Is there a way by which I (as root) can kill all those
| > > jobs submitted outside of SGE on compute nodes by these
| > > normal users?
| 
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to