This only happens for qrsh without a command. Qsub is not affected since it 
does have a command. I am still baffled by this, don't know how to troubleshoot 
this.

Thanks
John



-----Original Message-----
From: Reuti [mailto:[email protected]]
Sent: Saturday, May 20, 2017 12:53
To: John_Tai
Cc: [email protected]
Subject: Re: [gridengine users] sge_shepherd using 100% CPU

Hi,

> Am 16.05.2017 um 03:44 schrieb John_Tai <[email protected]>:
>
>>> And the opened shell is idling?
>
> No, it's working normally.
>
>>> How do you log in by this method – the default "builtin" method or anything 
>>> self defined?
>
> Default, I didn't define anything. Don't even know how to.
>
>>> Any global or queue prolog in place, which is supposed to run under the sge 
>>> account?
>
> There are no prolog/epilog defined.
>
>>> From the root account you can `strace -p 22443` and check what is going on 
>>> therein.
>
> It keeps looping through these messages. Is it normal?

No.

And this happens for the all logins by `qrsh` on all nodes, but not for 
conventional `qsub`ed jobs?

-- Reuti


> alarm(0)                                = 0
> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
> alarm(0)                                = 0
> alarm(0)                                = 0
> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
> alarm(0)                                = 0
> alarm(0)                                = 0
> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
> alarm(0)                                = 0
> alarm(0)                                = 0
> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
> alarm(0)                                = 0
> alarm(0)                                = 0
> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
> revents=POLLHUP}])
>
>
>
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Monday, May 15, 2017 5:46
> To: John_Tai
> Cc: [email protected]
> Subject: Re: [gridengine users] sge_shepherd using 100% CPU
>
> Hi,
>
>> Am 15.05.2017 um 05:28 schrieb John_Tai <[email protected]>:
>>
>> I recently found a weird problem with qrsh.
>>
>> If I just use it to login to an exec host, the sge_shepherd uses 100% of CPU.
>>
>> # qrsh -q lc.q@ibm105
>> # top
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 22443 sge       25   0 20396 1604 1272 R 99.5  0.0   0:08.80 sge_shepherd
>> 19927 sge       16   0  114m 3096 1836 S  0.0  0.0   0:00.26 sge_execd
>
> And the opened shell is idling?
>
> How do you log in by this method – the default "builtin" method or anything 
> self defined?
>
> In my clusters I can't observe this behavior.
>
> Even if there would be something running in any of the shell's profile: it 
> should show up for the opened shell but not for the sge_shepherd which runs 
> under the sge admin account.
>
> Any global or queue prolog in place, which is supposed to run under the sge 
> account?
>
> ==
>
> From the root account you can `strace -p 22443` and check what is going on 
> therein.
>
> -- Reuti
>
>
>> But if I submit an actual command with qrsh this doesn’t happen.
>>
>> # qrsh -q lc.q@ibm105 xclock
>> # top
>>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 19927 sge       16   0  114m 3100 1836 S  0.0  0.0   0:00.38 sge_execd
>> 22671 sge       18   0 20392 1584 1256 S  0.0  0.0   0:00.00 sge_shepherd
>>
>> Not sure why that is. How do I troubleshoot this?
>>
>> Thanks
>> Johnt
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>>
>> _______________________________________________
>> users mailing list
>> [email protected]
>> https://gridengine.org/mailman/listinfo/users
>
> ________________________________
>
> This email (including its attachments, if any) may be confidential and 
> proprietary information of SMIC, and intended only for the use of the named 
> recipient(s) above. Any unauthorized use or disclosure of this email is 
> strictly prohibited. If you are not the intended recipient(s), please notify 
> the sender immediately and delete this email from your computer.
>

________________________________

This email (including its attachments, if any) may be confidential and 
proprietary information of SMIC, and intended only for the use of the named 
recipient(s) above. Any unauthorized use or disclosure of this email is 
strictly prohibited. If you are not the intended recipient(s), please notify 
the sender immediately and delete this email from your computer.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to