Am 13.01.2012 um 19:40 schrieb Michael Coffman:

>>> <snip>
>>> It currently determines the pid of the shepherd process then watches all
>>> the children processes.
>> 
>> I think it's easier to use the additional group ID, which is attached to all 
>> kids by SGE, whether they jump out of the process tree or not. This one is 
>> recorded in $SGE_JOB_SPOOL_DIR in the file "addgrpid".
>> 
> 
> Had not thought of this.  Sounds like a good idea.  At first glance I
> am not seeing how to list the jobs via
> ps that are identified by the gid in the addgrpid file.   I tried ps
> -G`cat addgrpid`  -o vsz,rss,arg but it
> returns nothing.   I'll have to dig into this a bit more.

Yes, it's most likely only in the /proc:

$ qrsh
Running inside SGE
Job 3696
$ id
uid=1000(reuti) gid=100(users) 
groups=10(wheel),16(dialout),33(video),100(users),20007
$ grep -l -r "^Groups.* 20007" /proc/*/status 2>/dev/null | sed -n 
"s|/proc/\([0-9]*\)/status|\1|p"
13306
13628
13629


>>> Initially it will be watching memory usage and if a job begins using more
>>> physical memory than requested, the user will be notified.  That's where
>>> my question comes from.
>> 
>> What about setting a soft limit for h_vmem and prepare the job script to 
>> handle ithe signal to send an email. How will they request memory - by 
>> virtual_free?
> 
> Memory is requested via a consumable complex that we define as the
> amount of physical memory.  The way most of the jobs are run
> currently, we could not do this.  Job scripts typically call a
> commercial vendors binary so there is
> nothing listening for the signals.

Ok. Depending on the application and whether it resets the traps you can try to 
use a subshell as the signal is send to the complete process group to ignore it 
for the application:

#!/bin/bash
trap 'echo USR1' usr1
(trap '' usr1; exec your_binary) &
PID=$!
wait $PID
RET=$?
while [ $RET -eq 138 ]; do wait $PID; RET=$?; done


'' = two single quotation marks
After the first signal `wait` must be called again.


>>> Is there any way in the prolog to get access to the hard_request options
>>> besides using qstat?
>>> 
>>> What I'm currently doing:
>>> 
>>>  cmd = "bash -c '. #{@sge_root}/default/common/settings.sh && qstat
>>> -xml -j #{@number}'"
>>> 
>>> I have thought of possibly setting an environment variable via a jsv script
>>> that can be queried by the prolog script.  Is this a good idea?  How much 
>>> impact
>>> on submission time does jsv_send_env() add?
>> 
>> You can use either a JSV or a `qsub` wrapper for it.
>> 
>> 
>>> Any one else doing anything like this have any suggestions?
>>> 
>>> 
>>> The end goal is to have a utility that users can also interact with to
>>> monitor their jobs.  By either setting environment variables or grid
>>> complexes
>> 
>> Complexes are only handled internally by SGE. There is no user command to 
>> change them for a non-admin.
> 
> My thoughts on the complex were that there would be a complex flag
> that would indicate that the user
> wanted to monitor memory, or cpu, etc...  Not that it would be
> changeable by the user, just an indicator
> for the JSV script

Ok.

-- Reuti


>>> to affect the behavior of what is being watched and how they
>>> are notified.
>> 
>> AFAIK you can't change the content of an already inherited variable, as the 
>> process got a copy of the value. Also /proc/12345/environ is only readable. 
>> And your "observation daemon" will run on all nodes - one for each job from 
>> the prolog if I get you right?
> 
> Correct.
> 
>> 
>> But a nice solution could be the usage of the job context. This can be set 
>> by the user on the command line, and your job can access this by issuing a 
>> similar command like you did already. If the exechosts are submit hosts, the 
>> job can also change this by using `qalter` like the user has to use on the 
>> command line. We use the job context only for documentation purpose, to 
>> record the issued command and append it to the email which is send after the 
>> job.
>> 
>> http://gridengine.org/pipermail/users/2011-September/001629.html
>> 
>> $ qstat -j 12345
>> ...
>> context:                    COMMAND=subturbo -v 631 -g -m 3500 -p 8 -t 
>> infinity -s 
>> aoforce,OUTPUT=/home/foobar/carbene/gecl4_2carb228/trans_tzvp_3.out
>> 
>> It's only one long line, and I split it later on to inidividual entries. In 
>> your case you have to watch out for commas, as they are used already to 
>> separate entries.
> 
> The context sounds very interesting.  Not something we have really
> played around with.
> 
> Again.  Thanks for the input.
> 
> 
>> 
>> -- Reuti
>> 
>> 
>>> Thanks.
>>> 
>>> --
>>> -MichaelC
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> 
> 
> -- 
> -MichaelC
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to