Hi,

Am 13.01.2012 um 01:03 schrieb Michael Coffman:

> We are in the process of developing a gridwatcher utility that is launched
> in the background from the prolog script.   The intent is to have a
> process monitor various aspects of the job and store or report on them.

this is of course an interesting goal. What are you missing right now?


> It currently determines the pid of the shepherd process then watches all
> the children processes.

I think it's easier to use the additional group ID, which is attached to all 
kids by SGE, whether they jump out of the process tree or not. This one is 
recorded in $SGE_JOB_SPOOL_DIR in the file "addgrpid".


> Initially it will be watching memory usage and if a job begins using more
> physical memory than requested, the user will be notified.  That's where
> my question comes from.

What about setting a soft limit for h_vmem and prepare the job script to handle 
ithe signal to send an email. How will they request memory - by virtual_free?


> Is there any way in the prolog to get access to the hard_request options
> besides using qstat?
> 
> What I'm currently doing:
> 
>  cmd = "bash -c '. #{@sge_root}/default/common/settings.sh && qstat
> -xml -j #{@number}'"
> 
> I have thought of possibly setting an environment variable via a jsv script
> that can be queried by the prolog script.  Is this a good idea?  How much 
> impact
> on submission time does jsv_send_env() add?

You can use either a JSV or a `qsub` wrapper for it.


> Any one else doing anything like this have any suggestions?
> 
> 
> The end goal is to have a utility that users can also interact with to
> monitor their jobs.  By either setting environment variables or grid
> complexes

Complexes are only handled internally by SGE. There is no user command to 
change them for a non-admin.


> to affect the behavior of what is being watched and how they
> are notified.

AFAIK you can't change the content of an already inherited variable, as the 
process got a copy of the value. Also /proc/12345/environ is only readable. And 
your "observation daemon" will run on all nodes - one for each job from the 
prolog if I get you right?

But a nice solution could be the usage of the job context. This can be set by 
the user on the command line, and your job can access this by issuing a similar 
command like you did already. If the exechosts are submit hosts, the job can 
also change this by using `qalter` like the user has to use on the command 
line. We use the job context only for documentation purpose, to record the 
issued command and append it to the email which is send after the job.

http://gridengine.org/pipermail/users/2011-September/001629.html

$ qstat -j 12345
...
context:                    COMMAND=subturbo -v 631 -g -m 3500 -p 8 -t infinity 
-s aoforce,OUTPUT=/home/foobar/carbene/gecl4_2carb228/trans_tzvp_3.out

It's only one long line, and I split it later on to inidividual entries. In 
your case you have to watch out for commas, as they are used already to 
separate entries.

-- Reuti


> Thanks.
> 
> -- 
> -MichaelC
> _______________________________________________
> users mailing list
> users@gridengine.org
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to