> Am 22.05.2017 um 03:02 schrieb John_Tai <[email protected]>:
> 
> This only happens for qrsh without a command. Qsub is not affected since it 
> does have a command. I am still baffled by this, don't know how to 
> troubleshoot this.

Does it also happen, if the submitted command with `qrsh` is something like a 
`sleep 120`?

-- Reuti


> Thanks
> John
> 
> 
> 
> -----Original Message-----
> From: Reuti [mailto:[email protected]]
> Sent: Saturday, May 20, 2017 12:53
> To: John_Tai
> Cc: [email protected]
> Subject: Re: [gridengine users] sge_shepherd using 100% CPU
> 
> Hi,
> 
>> Am 16.05.2017 um 03:44 schrieb John_Tai <[email protected]>:
>> 
>>>> And the opened shell is idling?
>> 
>> No, it's working normally.
>> 
>>>> How do you log in by this method – the default "builtin" method or 
>>>> anything self defined?
>> 
>> Default, I didn't define anything. Don't even know how to.
>> 
>>>> Any global or queue prolog in place, which is supposed to run under the 
>>>> sge account?
>> 
>> There are no prolog/epilog defined.
>> 
>>>> From the root account you can `strace -p 22443` and check what is going on 
>>>> therein.
>> 
>> It keeps looping through these messages. Is it normal?
> 
> No.
> 
> And this happens for the all logins by `qrsh` on all nodes, but not for 
> conventional `qsub`ed jobs?
> 
> -- Reuti
> 
> 
>> alarm(0)                                = 0
>> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
>> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
>> alarm(0)                                = 0
>> alarm(0)                                = 0
>> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
>> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
>> alarm(0)                                = 0
>> alarm(0)                                = 0
>> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
>> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
>> alarm(0)                                = 0
>> alarm(0)                                = 0
>> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
>> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0
>> alarm(0)                                = 0
>> alarm(0)                                = 0
>> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6,
>> revents=POLLHUP}])
>> 
>> 
>> 
>> -----Original Message-----
>> From: Reuti [mailto:[email protected]]
>> Sent: Monday, May 15, 2017 5:46
>> To: John_Tai
>> Cc: [email protected]
>> Subject: Re: [gridengine users] sge_shepherd using 100% CPU
>> 
>> Hi,
>> 
>>> Am 15.05.2017 um 05:28 schrieb John_Tai <[email protected]>:
>>> 
>>> I recently found a weird problem with qrsh.
>>> 
>>> If I just use it to login to an exec host, the sge_shepherd uses 100% of 
>>> CPU.
>>> 
>>> # qrsh -q lc.q@ibm105
>>> # top
>>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 22443 sge       25   0 20396 1604 1272 R 99.5  0.0   0:08.80 sge_shepherd
>>> 19927 sge       16   0  114m 3096 1836 S  0.0  0.0   0:00.26 sge_execd
>> 
>> And the opened shell is idling?
>> 
>> How do you log in by this method – the default "builtin" method or anything 
>> self defined?
>> 
>> In my clusters I can't observe this behavior.
>> 
>> Even if there would be something running in any of the shell's profile: it 
>> should show up for the opened shell but not for the sge_shepherd which runs 
>> under the sge admin account.
>> 
>> Any global or queue prolog in place, which is supposed to run under the sge 
>> account?
>> 
>> ==
>> 
>> From the root account you can `strace -p 22443` and check what is going on 
>> therein.
>> 
>> -- Reuti
>> 
>> 
>>> But if I submit an actual command with qrsh this doesn’t happen.
>>> 
>>> # qrsh -q lc.q@ibm105 xclock
>>> # top
>>> PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>> 19927 sge       16   0  114m 3100 1836 S  0.0  0.0   0:00.38 sge_execd
>>> 22671 sge       18   0 20392 1584 1256 S  0.0  0.0   0:00.00 sge_shepherd
>>> 
>>> Not sure why that is. How do I troubleshoot this?
>>> 
>>> Thanks
>>> Johnt
>>> This email (including its attachments, if any) may be confidential and 
>>> proprietary information of SMIC, and intended only for the use of the named 
>>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>>> strictly prohibited. If you are not the intended recipient(s), please 
>>> notify the sender immediately and delete this email from your computer.
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>> 
>> ________________________________
>> 
>> This email (including its attachments, if any) may be confidential and 
>> proprietary information of SMIC, and intended only for the use of the named 
>> recipient(s) above. Any unauthorized use or disclosure of this email is 
>> strictly prohibited. If you are not the intended recipient(s), please notify 
>> the sender immediately and delete this email from your computer.
>> 
> 
> ________________________________
> 
> This email (including its attachments, if any) may be confidential and 
> proprietary information of SMIC, and intended only for the use of the named 
> recipient(s) above. Any unauthorized use or disclosure of this email is 
> strictly prohibited. If you are not the intended recipient(s), please notify 
> the sender immediately and delete this email from your computer.
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to