Am 13.01.2012 um 19:40 schrieb Michael Coffman:
>>> <snip>
>>> It currently determines the pid of the shepherd process then watches all
>>> the children processes.
>>
>> I think it's easier to use the additional group ID, which is attached to all
>> kids by SGE, whether they jump out of the process tree or not. This one is
>> recorded in $SGE_JOB_SPOOL_DIR in the file "addgrpid".
>>
>
> Had not thought of this. Sounds like a good idea. At first glance I
> am not seeing how to list the jobs via
> ps that are identified by the gid in the addgrpid file. I tried ps
> -G`cat addgrpid` -o vsz,rss,arg but it
> returns nothing. I'll have to dig into this a bit more.
Yes, it's most likely only in the /proc:
$ qrsh
Running inside SGE
Job 3696
$ id
uid=1000(reuti) gid=100(users)
groups=10(wheel),16(dialout),33(video),100(users),20007
$ grep -l -r "^Groups.* 20007" /proc/*/status 2>/dev/null | sed -n
"s|/proc/\([0-9]*\)/status|\1|p"
13306
13628
13629
>>> Initially it will be watching memory usage and if a job begins using more
>>> physical memory than requested, the user will be notified. That's where
>>> my question comes from.
>>
>> What about setting a soft limit for h_vmem and prepare the job script to
>> handle ithe signal to send an email. How will they request memory - by
>> virtual_free?
>
> Memory is requested via a consumable complex that we define as the
> amount of physical memory. The way most of the jobs are run
> currently, we could not do this. Job scripts typically call a
> commercial vendors binary so there is
> nothing listening for the signals.
Ok. Depending on the application and whether it resets the traps you can try to
use a subshell as the signal is send to the complete process group to ignore it
for the application:
#!/bin/bash
trap 'echo USR1' usr1
(trap '' usr1; exec your_binary) &
PID=$!
wait $PID
RET=$?
while [ $RET -eq 138 ]; do wait $PID; RET=$?; done
'' = two single quotation marks
After the first signal `wait` must be called again.
>>> Is there any way in the prolog to get access to the hard_request options
>>> besides using qstat?
>>>
>>> What I'm currently doing:
>>>
>>> cmd = "bash -c '. #{@sge_root}/default/common/settings.sh && qstat
>>> -xml -j #{@number}'"
>>>
>>> I have thought of possibly setting an environment variable via a jsv script
>>> that can be queried by the prolog script. Is this a good idea? How much
>>> impact
>>> on submission time does jsv_send_env() add?
>>
>> You can use either a JSV or a `qsub` wrapper for it.
>>
>>
>>> Any one else doing anything like this have any suggestions?
>>>
>>>
>>> The end goal is to have a utility that users can also interact with to
>>> monitor their jobs. By either setting environment variables or grid
>>> complexes
>>
>> Complexes are only handled internally by SGE. There is no user command to
>> change them for a non-admin.
>
> My thoughts on the complex were that there would be a complex flag
> that would indicate that the user
> wanted to monitor memory, or cpu, etc... Not that it would be
> changeable by the user, just an indicator
> for the JSV script
Ok.
-- Reuti
>>> to affect the behavior of what is being watched and how they
>>> are notified.
>>
>> AFAIK you can't change the content of an already inherited variable, as the
>> process got a copy of the value. Also /proc/12345/environ is only readable.
>> And your "observation daemon" will run on all nodes - one for each job from
>> the prolog if I get you right?
>
> Correct.
>
>>
>> But a nice solution could be the usage of the job context. This can be set
>> by the user on the command line, and your job can access this by issuing a
>> similar command like you did already. If the exechosts are submit hosts, the
>> job can also change this by using `qalter` like the user has to use on the
>> command line. We use the job context only for documentation purpose, to
>> record the issued command and append it to the email which is send after the
>> job.
>>
>> http://gridengine.org/pipermail/users/2011-September/001629.html
>>
>> $ qstat -j 12345
>> ...
>> context: COMMAND=subturbo -v 631 -g -m 3500 -p 8 -t
>> infinity -s
>> aoforce,OUTPUT=/home/foobar/carbene/gecl4_2carb228/trans_tzvp_3.out
>>
>> It's only one long line, and I split it later on to inidividual entries. In
>> your case you have to watch out for commas, as they are used already to
>> separate entries.
>
> The context sounds very interesting. Not something we have really
> played around with.
>
> Again. Thanks for the input.
>
>
>>
>> -- Reuti
>>
>>
>>> Thanks.
>>>
>>> --
>>> -MichaelC
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>
>
>
>
> --
> -MichaelC
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users