> Am 12.05.2015 um 17:03 schrieb <[email protected]> 
> <[email protected]>:
> 
> Hi Reuti,
> 
> In the link suggested by you 
> (https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html ) it is 
> mentioned as below
> 
> "To  have a tight integration of SSH into SGE, the started sshd needs an 
> additional group ID to be attached."
> 
> Checked the configuration from our side and the addgrpid is generated
> 
> /opt/sge/default/spool/active_jobs/8143543.1 : ls
> addgrpid

Yes, but not attached to all processes. Processes running in a tight 
integration needs them attached like something in /proc:

reuti@node:/proc/24989> cat status
...
Groups: 20082 24000 25000

And the 20082 is the additional one.

-- Reuti


> 
> Regards,
> Sudha
> 
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Monday, May 11, 2015 2:08 AM
> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
> Cc: [email protected]; [email protected]
> Subject: Re: [gridengine users] grid jobs not visible with qstat output
> 
> 
> Am 10.05.2015 um 19:30 schrieb <[email protected]> 
> <[email protected]>:
> 
>> Hi Reuti,
>> 
>> The startup mechanism is as below
>> 
>> qlogin_daemon                /usr/sbin/sshd -i
>> qlogin_command               /gridapl1/HWEE_ge6/new/qssh
> 
> Then it's most likely that the `ssh` is not tightly integrated into SGE. 
> Please have a look at:
> 
> https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html
> 
> section "SSH TIGHT INTEGRATION".
> 
> -- Reuti
> 
> 
>> Regards,
>> Sudha
>> 
>> -----Original Message-----
>> From: Reuti [mailto:[email protected]]
>> Sent: Friday, May 08, 2015 10:50 PM
>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>> Cc: [email protected]; [email protected]
>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>> 
>> 
>>> Am 08.05.2015 um 16:57 schrieb [email protected]:
>>> 
>>> Hi Zhang,
>>> 
>>> Please find the o/p
>>> 
>>> 32682 61457200 27020 karppa 32682 
>>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter 
>>> /gridapl1/HWEE_ge6/default/spo
>>> 32734 61457200 27020 karppa 32734  \_ /bin/ksh ./run_it_file.vcs
>>> 33043 61457200 27020 karppa 32734      \_ /bin/ksh ./vcs.start.dh.no_gui
>>> 33059 61457200 27020 karppa 32734          \_ 
>>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+
>>> 38048 61457200 27020 karppa 32734              \_ [target.bin] <defunct>
>>> 5049 61457200 27020 karppa 5049 
>>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter 
>>> /gridapl1/HWEE_ge6/default/spoo
>>> 5101 61457200 27020 karppa 5101  \_ /bin/ksh ./run_it_file.vcs
>>> 5408 61457200 27020 karppa 5101      \_ /bin/ksh ./vcs.start.dh.no_gui
>>> 5424 61457200 27020 karppa 5101          \_ 
>>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+a
>>> 9089 61457200 27020 karppa 5101              \_ [target.bin] <defunct>
>> 
>> The problem seems to be, that the `qrsh`starter` is no longer bound to the 
>> "sge_shephered". This was after the job? How does it look like while SGE 
>> still knows about the job. What is the startup mechanism:
>> 
>> $ qconf -sconf
>> ...
>> qlogin_command               builtin
>> qlogin_daemon                builtin
>> rlogin_command               builtin
>> rlogin_daemon                builtin
>> rsh_command                  builtin
>> rsh_daemon                   builtin
>> 
>> -- Reuti
>> 
>> 
>>> Regards,
>>> Sudha
>>> 
>>> -----Original Message-----
>>> From: Feng Zhang [mailto:[email protected]]
>>> Sent: Friday, May 08, 2015 7:35 PM
>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>>> 
>>> Sudha,
>>> 
>>> Can you run "ps -e f -o pid,ppid,command", which can show more details?
>>> 
>>> On Fri, May 8, 2015 at 4:09 AM,  <[email protected]> wrote:
>>>> Hi Reuti,
>>>> 
>>>> The processes are not bound to sge_shepherd anymore.
>>>> 
>>>> Below are the qrsh_starter processes running still
>>>> 
>>>> 5049 ?        00:00:00 qrsh_starter
>>>> 5101 ?        00:00:00 run_it_file.vcs
>>>> 5408 ?        00:00:00 vcs.start.dh.no
>>>> 5424 ?        8-20:57:02 simv
>>>> 9089 ?        00:00:00 target.bin <defunct>
>>>> 16868 ?        00:00:00 sshd
>>>> 16913 pts/9    00:00:00 bash
>>>> 17371 pts/9    00:00:00 ps
>>>> 32682 ?        00:00:00 qrsh_starter
>>>> 32734 ?        00:00:00 run_it_file.vcs
>>>> 33043 ?        00:00:00 vcs.start.dh.no
>>>> 33059 ?        8-21:19:03 simv
>>>> 38048 ?        00:00:00 target.bin <defunct>
>>>> 
>>>> Regards,
>>>> Sudha
>>>> 
>>>> -----Original Message-----
>>>> From: Reuti [mailto:[email protected]]
>>>> Sent: Thursday, May 07, 2015 9:52 PM
>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>> Cc: [email protected]; [email protected]
>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>>>> 
>>>> Are the processes still bound to the sge_shephered or did they jump out of 
>>>> the process tree? By what method were they started by qrsh_starter: 
>>>> "builtin" or by defining `ssh`?
>>>> 
>>>> -- Reuti
>>>> 
>>>> 
>>>>> Am 07.05.2015 um 18:00 schrieb <[email protected]> 
>>>>> <[email protected]>:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> No the slots are not being used anymore
>>>>> 
>>>>> That according to qstat I seem not to have any jobs at host. However, 
>>>>> there are my processes running in that specific host (launched by 
>>>>> qrsh_starter) that are altogether consuming 200% of CPU and licenses. The 
>>>>> problem here is that the processes have been running there over a week 
>>>>> and I haven't been aware of those. I've thought that the processes were 
>>>>> killed when the job was killed with qdel.
>>>>> 
>>>>> What could be the reason for this.
>>>>> 
>>>>> Regards,
>>>>> Sudha
>>>>> 
>>>>> From: Srirangam Addepalli [mailto:[email protected]]
>>>>> Sent: Wednesday, May 06, 2015 7:52 PM
>>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>>>>> 
>>>>> That would be strange.  Do the slots on the host show as being used.
>>>>> 
>>>>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. 
>>>>> Unless qrsh some how spwanned a process that is not bound by sge_execd. 
>>>>> On the client/ execution host  what info do you have in active_jobs and 
>>>>> jobs directories.  It is more likely that the qrsh session is terminated 
>>>>> but left resident processes.
>>>>> 
>>>>> Rangam
>>>>> 
>>>>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote:
>>>>> Hi,
>>>>> 
>>>>> I noticed that I've had two grid jobs running over a week on a machine of 
>>>>> which I haven't been aware of. Both of the jobs have been launched with 
>>>>> qrsh but they are not visible with qstat thus for a reason or another 
>>>>> they are no longer included in grid book-keeping. This issue will cause 
>>>>> that grid resources are wasted for ghost jobs as for example both of my 
>>>>> jobs seem to consume 100% CPU on the host.
>>>>> 
>>>>> Can anyone please explain on this.
>>>>> 
>>>>> Regards,
>>>>> Sudha
>>>>> 
>>>>> The information contained in this electronic message and any attachments 
>>>>> to this message are intended for the exclusive use of the addressee(s) 
>>>>> and may contain proprietary, confidential or privileged information. If 
>>>>> you are not the intended recipient, you should not disseminate, 
>>>>> distribute or copy this e-mail. Please notify the sender immediately and 
>>>>> destroy all copies of this message and any attachments. WARNING: Computer 
>>>>> viruses can be transmitted via email. The recipient should check this 
>>>>> email and any attachments for the presence of viruses. The company 
>>>>> accepts no liability for any damage caused by any virus transmitted by 
>>>>> this email. www.wipro.com
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> [email protected]
>>>>> https://gridengine.org/mailman/listinfo/users
>>>>> 
>>>>> 
>>>>> The information contained in this electronic message and any attachments 
>>>>> to this message are intended for the exclusive use of the addressee(s) 
>>>>> and may contain proprietary, confidential or privileged information. If 
>>>>> you are not the intended recipient, you should not disseminate, 
>>>>> distribute or copy this e-mail. Please notify the sender immediately and 
>>>>> destroy all copies of this message and any attachments. WARNING: Computer 
>>>>> viruses can be transmitted via email. The recipient should check this 
>>>>> email and any attachments for the presence of viruses. The company 
>>>>> accepts no liability for any damage caused by any virus transmitted by 
>>>>> this email. www.wipro.com
>>>> 
>>>> The information contained in this electronic message and any attachments 
>>>> to this message are intended for the exclusive use of the addressee(s) and 
>>>> may contain proprietary, confidential or privileged information. If you 
>>>> are not the intended recipient, you should not disseminate, distribute or 
>>>> copy this e-mail. Please notify the sender immediately and destroy all 
>>>> copies of this message and any attachments. WARNING: Computer viruses can 
>>>> be transmitted via email. The recipient should check this email and any 
>>>> attachments for the presence of viruses. The company accepts no liability 
>>>> for any damage caused by any virus transmitted by this email. www.wipro.com
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>> 
>>> 
>>> 
>>> --
>>> Best,
>>> 
>>> Feng
>>> The information contained in this electronic message and any attachments to 
>>> this message are intended for the exclusive use of the addressee(s) and may 
>>> contain proprietary, confidential or privileged information. If you are not 
>>> the intended recipient, you should not disseminate, distribute or copy this 
>>> e-mail. Please notify the sender immediately and destroy all copies of this 
>>> message and any attachments. WARNING: Computer viruses can be transmitted 
>>> via email. The recipient should check this email and any attachments for 
>>> the presence of viruses. The company accepts no liability for any damage 
>>> caused by any virus transmitted by this email. www.wipro.com
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>> 
>> 
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain proprietary, confidential or privileged information. If you are not 
>> the intended recipient, you should not disseminate, distribute or copy this 
>> e-mail. Please notify the sender immediately and destroy all copies of this 
>> message and any attachments. WARNING: Computer viruses can be transmitted 
>> via email. The recipient should check this email and any attachments for the 
>> presence of viruses. The company accepts no liability for any damage caused 
>> by any virus transmitted by this email. www.wipro.com
>> 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus transmitted by this email. www.wipro.com
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to