> Am 13.05.2015 um 13:36 schrieb <[email protected]> 
> <[email protected]>:
> 
> Hi Reuti,
> 
> In qconf -sconf we have the configuration as follows
> execd_params                 enable_windomacc=true
> 
> Can you please confirm if we can add as below or should it be defined in a 
> different way
> 
> execd_params                 enable_windomacc=true ENABLE_ADDGRP_KILL=TRUE

It's correct. - Reuti


> 
> Regards,
> Sudha
> 
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Wednesday, May 13, 2015 4:17 PM
> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
> Cc: [email protected]
> Subject: Re: [gridengine users] grid jobs not visible with qstat output
> 
> 
>> Am 13.05.2015 um 12:35 schrieb <[email protected]> 
>> <[email protected]>:
>> 
>> Hi Reuti,
>> 
>> I did some testing again and now the process is killed after deleting the 
>> job using qdel job_id.  Please find the test results.
>> 
>> After starting the job, the process started on the execution host
>> 
>> qstat -j 8150628
>> =================================================
>> job_number:                 8150628
>> exec_file:                  job_scripts/8150628
>> submission_time:            Wed May 13 13:00:08 2015
>> owner:                      spenmets
>> uid:                        78566
>> group:                      newgrp1
>> gid:                        1018
>> 
>> =================================================
>> [spenmets@node2 homes/users/spenmets]$ps -au spenmets
>> PID TTY          TIME CMD
>> 10837 pts/12   00:00:00 qrsh_starter
>> 10911 pts/12   00:00:00 xterm
> 
> As long as the process will stay attached to the `qrsh_starter`, it will be 
> killed too as SGE will kill the complete process group. The problem arises, 
> when a process jumps out of the process tree and must be detected by the 
> additional group ID. Then also "execd_params ENABLE_ADDGRP_KILL=TRUE" in 
> SGE's configuration must be set to allow this facility to jump in.
> 
> -- Reuti
> 
> 
>> =================================================
>> 
>> [spenmets@node2 proc/10837]$cat status
>> Name:   qrsh_starter
>> Gid:    1018    1018    1018    1018
>> Utrace: 0
>> FDSize: 64
>> Groups: 1000 1018 1025 1030 27000 27001 27007 27010 27014 27017 27025
>> ================================================
>> 
>> gridnode @ /xxxxx/xxxxx/xxxxx : qdel 8150628 registered the job
>> 8150628 for deletion gridnode @ /xxxxx/xxxxx/xxxxx : qstat -j 8150628
>> Following jobs do not exist:
>> 8150628
>> 
>> ===============================================
>> 
>> [spenmets@node2 homes/users/spenmets]$ps 10837
>> PID TTY      STAT   TIME COMMAND
>> [spenmets@node2 homes/users/spenmets]$cd /proc/10837
>> -bash: cd: /proc/10837: No such file or directory
>> 
>> Does it mean not an issue with tight integration of SSH into SGE.
>> 
>> Regards,
>> Sudha
>> 
>> -----Original Message-----
>> From: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>> Sent: Wednesday, May 13, 2015 1:15 PM
>> To: 'Reuti'
>> Cc: [email protected]
>> Subject: RE: [gridengine users] grid jobs not visible with qstat
>> output
>> 
>> Hi Reuti,
>> 
>> The value in /opt/sge/default/spool/active_jobs/8143543.1/addgrpid is
>> not there in /proc/
>> 
>> But the the child processes of the job are available in /proc/.
>> 
>> Can you please suggest a solution.
>> 
>> Regards,
>> Sudha
>> 
>> -----Original Message-----
>> From: Reuti [mailto:[email protected]]
>> Sent: Tuesday, May 12, 2015 8:53 PM
>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>> Cc: [email protected]; [email protected]
>> Subject: Re: [gridengine users] grid jobs not visible with qstat
>> output
>> 
>> 
>>> Am 12.05.2015 um 17:03 schrieb <[email protected]> 
>>> <[email protected]>:
>>> 
>>> Hi Reuti,
>>> 
>>> In the link suggested by you
>>> (https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html ) it
>>> is mentioned as below
>>> 
>>> "To  have a tight integration of SSH into SGE, the started sshd needs an 
>>> additional group ID to be attached."
>>> 
>>> Checked the configuration from our side and the addgrpid is generated
>>> 
>>> /opt/sge/default/spool/active_jobs/8143543.1 : ls addgrpid
>> 
>> Yes, but not attached to all processes. Processes running in a tight 
>> integration needs them attached like something in /proc:
>> 
>> reuti@node:/proc/24989> cat status
>> ...
>> Groups: 20082 24000 25000
>> 
>> And the 20082 is the additional one.
>> 
>> -- Reuti
>> 
>> 
>>> 
>>> Regards,
>>> Sudha
>>> 
>>> -----Original Message-----
>>> From: Reuti [mailto:[email protected]]
>>> Sent: Monday, May 11, 2015 2:08 AM
>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>> Cc: [email protected]; [email protected]
>>> Subject: Re: [gridengine users] grid jobs not visible with qstat
>>> output
>>> 
>>> 
>>> Am 10.05.2015 um 19:30 schrieb <[email protected]> 
>>> <[email protected]>:
>>> 
>>>> Hi Reuti,
>>>> 
>>>> The startup mechanism is as below
>>>> 
>>>> qlogin_daemon                /usr/sbin/sshd -i
>>>> qlogin_command               /gridapl1/HWEE_ge6/new/qssh
>>> 
>>> Then it's most likely that the `ssh` is not tightly integrated into SGE. 
>>> Please have a look at:
>>> 
>>> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
>>> 
>>> section "SSH TIGHT INTEGRATION".
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Regards,
>>>> Sudha
>>>> 
>>>> -----Original Message-----
>>>> From: Reuti [mailto:[email protected]]
>>>> Sent: Friday, May 08, 2015 10:50 PM
>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>> Cc: [email protected]; [email protected]
>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat
>>>> output
>>>> 
>>>> 
>>>>> Am 08.05.2015 um 16:57 schrieb [email protected]:
>>>>> 
>>>>> Hi Zhang,
>>>>> 
>>>>> Please find the o/p
>>>>> 
>>>>> 32682 61457200 27020 karppa 32682
>>>>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter
>>>>> /gridapl1/HWEE_ge6/default/spo
>>>>> 32734 61457200 27020 karppa 32734  \_ /bin/ksh ./run_it_file.vcs
>>>>> 33043 61457200 27020 karppa 32734      \_ /bin/ksh ./vcs.start.dh.no_gui
>>>>> 33059 61457200 27020 karppa 32734          \_ 
>>>>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+
>>>>> 38048 61457200 27020 karppa 32734              \_ [target.bin] <defunct>
>>>>> 5049 61457200 27020 karppa 5049
>>>>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter
>>>>> /gridapl1/HWEE_ge6/default/spoo
>>>>> 5101 61457200 27020 karppa 5101  \_ /bin/ksh ./run_it_file.vcs
>>>>> 5408 61457200 27020 karppa 5101      \_ /bin/ksh ./vcs.start.dh.no_gui
>>>>> 5424 61457200 27020 karppa 5101          \_ 
>>>>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm 
>>>>> line+cond+fsm+branch+tgl+a
>>>>> 9089 61457200 27020 karppa 5101              \_ [target.bin] <defunct>
>>>> 
>>>> The problem seems to be, that the `qrsh`starter` is no longer bound to the 
>>>> "sge_shephered". This was after the job? How does it look like while SGE 
>>>> still knows about the job. What is the startup mechanism:
>>>> 
>>>> $ qconf -sconf
>>>> ...
>>>> qlogin_command               builtin
>>>> qlogin_daemon                builtin
>>>> rlogin_command               builtin
>>>> rlogin_daemon                builtin
>>>> rsh_command                  builtin
>>>> rsh_daemon                   builtin
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> Regards,
>>>>> Sudha
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Feng Zhang [mailto:[email protected]]
>>>>> Sent: Friday, May 08, 2015 7:35 PM
>>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat
>>>>> output
>>>>> 
>>>>> Sudha,
>>>>> 
>>>>> Can you run "ps -e f -o pid,ppid,command", which can show more details?
>>>>> 
>>>>> On Fri, May 8, 2015 at 4:09 AM,  <[email protected]> wrote:
>>>>>> Hi Reuti,
>>>>>> 
>>>>>> The processes are not bound to sge_shepherd anymore.
>>>>>> 
>>>>>> Below are the qrsh_starter processes running still
>>>>>> 
>>>>>> 5049 ?        00:00:00 qrsh_starter
>>>>>> 5101 ?        00:00:00 run_it_file.vcs
>>>>>> 5408 ?        00:00:00 vcs.start.dh.no
>>>>>> 5424 ?        8-20:57:02 simv
>>>>>> 9089 ?        00:00:00 target.bin <defunct>
>>>>>> 16868 ?        00:00:00 sshd
>>>>>> 16913 pts/9    00:00:00 bash
>>>>>> 17371 pts/9    00:00:00 ps
>>>>>> 32682 ?        00:00:00 qrsh_starter
>>>>>> 32734 ?        00:00:00 run_it_file.vcs
>>>>>> 33043 ?        00:00:00 vcs.start.dh.no
>>>>>> 33059 ?        8-21:19:03 simv
>>>>>> 38048 ?        00:00:00 target.bin <defunct>
>>>>>> 
>>>>>> Regards,
>>>>>> Sudha
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Reuti [mailto:[email protected]]
>>>>>> Sent: Thursday, May 07, 2015 9:52 PM
>>>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>>>> Cc: [email protected]; [email protected]
>>>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat
>>>>>> output
>>>>>> 
>>>>>> Are the processes still bound to the sge_shephered or did they jump out 
>>>>>> of the process tree? By what method were they started by qrsh_starter: 
>>>>>> "builtin" or by defining `ssh`?
>>>>>> 
>>>>>> -- Reuti
>>>>>> 
>>>>>> 
>>>>>>> Am 07.05.2015 um 18:00 schrieb <[email protected]> 
>>>>>>> <[email protected]>:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> No the slots are not being used anymore
>>>>>>> 
>>>>>>> That according to qstat I seem not to have any jobs at host. However, 
>>>>>>> there are my processes running in that specific host (launched by 
>>>>>>> qrsh_starter) that are altogether consuming 200% of CPU and licenses. 
>>>>>>> The problem here is that the processes have been running there over a 
>>>>>>> week and I haven't been aware of those. I've thought that the processes 
>>>>>>> were killed when the job was killed with qdel.
>>>>>>> 
>>>>>>> What could be the reason for this.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Sudha
>>>>>>> 
>>>>>>> From: Srirangam Addepalli [mailto:[email protected]]
>>>>>>> Sent: Wednesday, May 06, 2015 7:52 PM
>>>>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat
>>>>>>> output
>>>>>>> 
>>>>>>> That would be strange.  Do the slots on the host show as being used.
>>>>>>> 
>>>>>>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. 
>>>>>>> Unless qrsh some how spwanned a process that is not bound by sge_execd. 
>>>>>>> On the client/ execution host  what info do you have in active_jobs and 
>>>>>>> jobs directories.  It is more likely that the qrsh session is 
>>>>>>> terminated but left resident processes.
>>>>>>> 
>>>>>>> Rangam
>>>>>>> 
>>>>>>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote:
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I noticed that I've had two grid jobs running over a week on a machine 
>>>>>>> of which I haven't been aware of. Both of the jobs have been launched 
>>>>>>> with qrsh but they are not visible with qstat thus for a reason or 
>>>>>>> another they are no longer included in grid book-keeping. This issue 
>>>>>>> will cause that grid resources are wasted for ghost jobs as for example 
>>>>>>> both of my jobs seem to consume 100% CPU on the host.
>>>>>>> 
>>>>>>> Can anyone please explain on this.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Sudha
>>>>>>> 
>>>>>>> The information contained in this electronic message and any
>>>>>>> attachments to this message are intended for the exclusive use of
>>>>>>> the addressee(s) and may contain proprietary, confidential or
>>>>>>> privileged information. If you are not the intended recipient,
>>>>>>> you should not disseminate, distribute or copy this e-mail.
>>>>>>> Please notify the sender immediately and destroy all copies of
>>>>>>> this message and any attachments. WARNING: Computer viruses can
>>>>>>> be transmitted via email. The recipient should check this email
>>>>>>> and any attachments for the presence of viruses. The company
>>>>>>> accepts no liability for any damage caused by any virus
>>>>>>> transmitted by this email. www.wipro.com
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> [email protected]
>>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>>>> 
>>>>>>> 
>>>>>>> The information contained in this electronic message and any
>>>>>>> attachments to this message are intended for the exclusive use of
>>>>>>> the addressee(s) and may contain proprietary, confidential or
>>>>>>> privileged information. If you are not the intended recipient,
>>>>>>> you should not disseminate, distribute or copy this e-mail.
>>>>>>> Please notify the sender immediately and destroy all copies of
>>>>>>> this message and any attachments. WARNING: Computer viruses can
>>>>>>> be transmitted via email. The recipient should check this email
>>>>>>> and any attachments for the presence of viruses. The company
>>>>>>> accepts no liability for any damage caused by any virus
>>>>>>> transmitted by this email. www.wipro.com
>>>>>> 
>>>>>> The information contained in this electronic message and any
>>>>>> attachments to this message are intended for the exclusive use of
>>>>>> the addressee(s) and may contain proprietary, confidential or
>>>>>> privileged information. If you are not the intended recipient, you
>>>>>> should not disseminate, distribute or copy this e-mail. Please
>>>>>> notify the sender immediately and destroy all copies of this
>>>>>> message and any attachments. WARNING: Computer viruses can be
>>>>>> transmitted via email. The recipient should check this email and
>>>>>> any attachments for the presence of viruses. The company accepts
>>>>>> no liability for any damage caused by any virus transmitted by
>>>>>> this email. www.wipro.com
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> [email protected]
>>>>>> https://gridengine.org/mailman/listinfo/users
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best,
>>>>> 
>>>>> Feng
>>>>> The information contained in this electronic message and any
>>>>> attachments to this message are intended for the exclusive use of
>>>>> the addressee(s) and may contain proprietary, confidential or
>>>>> privileged information. If you are not the intended recipient, you
>>>>> should not disseminate, distribute or copy this e-mail. Please
>>>>> notify the sender immediately and destroy all copies of this
>>>>> message and any attachments. WARNING: Computer viruses can be
>>>>> transmitted via email. The recipient should check this email and
>>>>> any attachments for the presence of viruses. The company accepts no
>>>>> liability for any damage caused by any virus transmitted by this email.
>>>>> www.wipro.com
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>>> 
>>>> 
>>>> The information contained in this electronic message and any
>>>> attachments to this message are intended for the exclusive use of
>>>> the
>>>> addressee(s) and may contain proprietary, confidential or privileged
>>>> information. If you are not the intended recipient, you should not
>>>> disseminate, distribute or copy this e-mail. Please notify the
>>>> sender immediately and destroy all copies of this message and any
>>>> attachments. WARNING: Computer viruses can be transmitted via email.
>>>> The recipient should check this email and any attachments for the
>>>> presence of viruses. The company accepts no liability for any damage
>>>> caused by any virus transmitted by this email. www.wipro.com
>>>> 
>>> 
>>> The information contained in this electronic message and any
>>> attachments to this message are intended for the exclusive use of the
>>> addressee(s) and may contain proprietary, confidential or privileged
>>> information. If you are not the intended recipient, you should not
>>> disseminate, distribute or copy this e-mail. Please notify the sender
>>> immediately and destroy all copies of this message and any
>>> attachments. WARNING: Computer viruses can be transmitted via email.
>>> The recipient should check this email and any attachments for the
>>> presence of viruses. The company accepts no liability for any damage
>>> caused by any virus transmitted by this email. www.wipro.com
>>> 
>> 
>> The information contained in this electronic message and any
>> attachments to this message are intended for the exclusive use of the
>> addressee(s) and may contain proprietary, confidential or privileged
>> information. If you are not the intended recipient, you should not
>> disseminate, distribute or copy this e-mail. Please notify the sender
>> immediately and destroy all copies of this message and any
>> attachments. WARNING: Computer viruses can be transmitted via email.
>> The recipient should check this email and any attachments for the
>> presence of viruses. The company accepts no liability for any damage
>> caused by any virus transmitted by this email. www.wipro.com
>> 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus transmitted by this email. www.wipro.com
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to