Am 10.05.2015 um 19:30 schrieb <[email protected]> 
<[email protected]>:

> Hi Reuti,
> 
> The startup mechanism is as below
> 
> qlogin_daemon                /usr/sbin/sshd -i
> qlogin_command               /gridapl1/HWEE_ge6/new/qssh

Then it's most likely that the `ssh` is not tightly integrated into SGE. Please 
have a look at:

https://arc.liv.ac.uk/SGE/htmlman/htmlman5/remote_startup.html

section "SSH TIGHT INTEGRATION".

-- Reuti


> Regards,
> Sudha
> 
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Friday, May 08, 2015 10:50 PM
> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
> Cc: [email protected]; [email protected]
> Subject: Re: [gridengine users] grid jobs not visible with qstat output
> 
> 
>> Am 08.05.2015 um 16:57 schrieb [email protected]:
>> 
>> Hi Zhang,
>> 
>> Please find the o/p
>> 
>> 32682 61457200 27020 karppa 32682 
>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter 
>> /gridapl1/HWEE_ge6/default/spo
>> 32734 61457200 27020 karppa 32734  \_ /bin/ksh ./run_it_file.vcs
>> 33043 61457200 27020 karppa 32734      \_ /bin/ksh ./vcs.start.dh.no_gui
>> 33059 61457200 27020 karppa 32734          \_ 
>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+
>> 38048 61457200 27020 karppa 32734              \_ [target.bin] <defunct>
>> 5049 61457200 27020 karppa 5049 
>> /applic36/grid/HWEE_ge6/utilbin/lx24-amd64/qrsh_starter 
>> /gridapl1/HWEE_ge6/default/spoo
>> 5101 61457200 27020 karppa 5101  \_ /bin/ksh ./run_it_file.vcs
>> 5408 61457200 27020 karppa 5101      \_ /bin/ksh ./vcs.start.dh.no_gui
>> 5424 61457200 27020 karppa 5101          \_ 
>> ./vcs/tb_bin/hdl_top_rtldhsim/simv -licqueue -cm line+cond+fsm+branch+tgl+a
>> 9089 61457200 27020 karppa 5101              \_ [target.bin] <defunct>
> 
> The problem seems to be, that the `qrsh`starter` is no longer bound to the 
> "sge_shephered". This was after the job? How does it look like while SGE 
> still knows about the job. What is the startup mechanism:
> 
> $ qconf -sconf
> ...
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin
> 
> -- Reuti
> 
> 
>> Regards,
>> Sudha
>> 
>> -----Original Message-----
>> From: Feng Zhang [mailto:[email protected]]
>> Sent: Friday, May 08, 2015 7:35 PM
>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>> 
>> Sudha,
>> 
>> Can you run "ps -e f -o pid,ppid,command", which can show more details?
>> 
>> On Fri, May 8, 2015 at 4:09 AM,  <[email protected]> wrote:
>>> Hi Reuti,
>>> 
>>> The processes are not bound to sge_shepherd anymore.
>>> 
>>> Below are the qrsh_starter processes running still
>>> 
>>> 5049 ?        00:00:00 qrsh_starter
>>> 5101 ?        00:00:00 run_it_file.vcs
>>> 5408 ?        00:00:00 vcs.start.dh.no
>>> 5424 ?        8-20:57:02 simv
>>> 9089 ?        00:00:00 target.bin <defunct>
>>> 16868 ?        00:00:00 sshd
>>> 16913 pts/9    00:00:00 bash
>>> 17371 pts/9    00:00:00 ps
>>> 32682 ?        00:00:00 qrsh_starter
>>> 32734 ?        00:00:00 run_it_file.vcs
>>> 33043 ?        00:00:00 vcs.start.dh.no
>>> 33059 ?        8-21:19:03 simv
>>> 38048 ?        00:00:00 target.bin <defunct>
>>> 
>>> Regards,
>>> Sudha
>>> 
>>> -----Original Message-----
>>> From: Reuti [mailto:[email protected]]
>>> Sent: Thursday, May 07, 2015 9:52 PM
>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>> Cc: [email protected]; [email protected]
>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>>> 
>>> Are the processes still bound to the sge_shephered or did they jump out of 
>>> the process tree? By what method were they started by qrsh_starter: 
>>> "builtin" or by defining `ssh`?
>>> 
>>> -- Reuti
>>> 
>>> 
>>>> Am 07.05.2015 um 18:00 schrieb <[email protected]> 
>>>> <[email protected]>:
>>>> 
>>>> Hi,
>>>> 
>>>> No the slots are not being used anymore
>>>> 
>>>> That according to qstat I seem not to have any jobs at host. However, 
>>>> there are my processes running in that specific host (launched by 
>>>> qrsh_starter) that are altogether consuming 200% of CPU and licenses. The 
>>>> problem here is that the processes have been running there over a week and 
>>>> I haven’t been aware of those. I’ve thought that the processes were killed 
>>>> when the job was killed with qdel.
>>>> 
>>>> What could be the reason for this.
>>>> 
>>>> Regards,
>>>> Sudha
>>>> 
>>>> From: Srirangam Addepalli [mailto:[email protected]]
>>>> Sent: Wednesday, May 06, 2015 7:52 PM
>>>> To: Sudha Padmini Penmetsa (WT01 - Global Media & Telecom)
>>>> Subject: Re: [gridengine users] grid jobs not visible with qstat output
>>>> 
>>>> That would be strange.  Do the slots on the host show as being used.
>>>> 
>>>> qhost -j -h hostname should list the jobs that Grid Engine is aware of. 
>>>> Unless qrsh some how spwanned a process that is not bound by sge_execd. On 
>>>> the client/ execution host  what info do you have in active_jobs and jobs 
>>>> directories.  It is more likely that the qrsh session is terminated but 
>>>> left resident processes.
>>>> 
>>>> Rangam
>>>> 
>>>> On Wed, May 6, 2015 at 9:05 AM, <[email protected]> wrote:
>>>> Hi,
>>>> 
>>>> I noticed that I've had two grid jobs running over a week on a machine of 
>>>> which I haven't been aware of. Both of the jobs have been launched with 
>>>> qrsh but they are not visible with qstat thus for a reason or another they 
>>>> are no longer included in grid book-keeping. This issue will cause that 
>>>> grid resources are wasted for ghost jobs as for example both of my jobs 
>>>> seem to consume 100% CPU on the host.
>>>> 
>>>> Can anyone please explain on this.
>>>> 
>>>> Regards,
>>>> Sudha
>>>> 
>>>> The information contained in this electronic message and any attachments 
>>>> to this message are intended for the exclusive use of the addressee(s) and 
>>>> may contain proprietary, confidential or privileged information. If you 
>>>> are not the intended recipient, you should not disseminate, distribute or 
>>>> copy this e-mail. Please notify the sender immediately and destroy all 
>>>> copies of this message and any attachments. WARNING: Computer viruses can 
>>>> be transmitted via email. The recipient should check this email and any 
>>>> attachments for the presence of viruses. The company accepts no liability 
>>>> for any damage caused by any virus transmitted by this email. www.wipro.com
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> [email protected]
>>>> https://gridengine.org/mailman/listinfo/users
>>>> 
>>>> 
>>>> The information contained in this electronic message and any attachments 
>>>> to this message are intended for the exclusive use of the addressee(s) and 
>>>> may contain proprietary, confidential or privileged information. If you 
>>>> are not the intended recipient, you should not disseminate, distribute or 
>>>> copy this e-mail. Please notify the sender immediately and destroy all 
>>>> copies of this message and any attachments. WARNING: Computer viruses can 
>>>> be transmitted via email. The recipient should check this email and any 
>>>> attachments for the presence of viruses. The company accepts no liability 
>>>> for any damage caused by any virus transmitted by this email. www.wipro.com
>>> 
>>> The information contained in this electronic message and any attachments to 
>>> this message are intended for the exclusive use of the addressee(s) and may 
>>> contain proprietary, confidential or privileged information. If you are not 
>>> the intended recipient, you should not disseminate, distribute or copy this 
>>> e-mail. Please notify the sender immediately and destroy all copies of this 
>>> message and any attachments. WARNING: Computer viruses can be transmitted 
>>> via email. The recipient should check this email and any attachments for 
>>> the presence of viruses. The company accepts no liability for any damage 
>>> caused by any virus transmitted by this email. www.wipro.com
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> 
>> 
>> --
>> Best,
>> 
>> Feng
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain proprietary, confidential or privileged information. If you are not 
>> the intended recipient, you should not disseminate, distribute or copy this 
>> e-mail. Please notify the sender immediately and destroy all copies of this 
>> message and any attachments. WARNING: Computer viruses can be transmitted 
>> via email. The recipient should check this email and any attachments for the 
>> presence of viruses. The company accepts no liability for any damage caused 
>> by any virus transmitted by this email. www.wipro.com
>> 
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>> 
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments. WARNING: Computer viruses can be transmitted via 
> email. The recipient should check this email and any attachments for the 
> presence of viruses. The company accepts no liability for any damage caused 
> by any virus transmitted by this email. www.wipro.com
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to