It doesn't happen with sleep. Only when you use qrsh (and qlogin) to remote login to an exec host.
-----Original Message----- From: Reuti [mailto:[email protected]] Sent: Tuesday, May 23, 2017 12:56 To: John_Tai Cc: [email protected] Subject: Re: [gridengine users] sge_shepherd using 100% CPU > Am 22.05.2017 um 03:02 schrieb John_Tai <[email protected]>: > > This only happens for qrsh without a command. Qsub is not affected since it > does have a command. I am still baffled by this, don't know how to > troubleshoot this. Does it also happen, if the submitted command with `qrsh` is something like a `sleep 120`? -- Reuti > Thanks > John > > > > -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Saturday, May 20, 2017 12:53 > To: John_Tai > Cc: [email protected] > Subject: Re: [gridengine users] sge_shepherd using 100% CPU > > Hi, > >> Am 16.05.2017 um 03:44 schrieb John_Tai <[email protected]>: >> >>>> And the opened shell is idling? >> >> No, it's working normally. >> >>>> How do you log in by this method – the default "builtin" method or >>>> anything self defined? >> >> Default, I didn't define anything. Don't even know how to. >> >>>> Any global or queue prolog in place, which is supposed to run under the >>>> sge account? >> >> There are no prolog/epilog defined. >> >>>> From the root account you can `strace -p 22443` and check what is going on >>>> therein. >> >> It keeps looping through these messages. Is it normal? > > No. > > And this happens for the all logins by `qrsh` on all nodes, but not for > conventional `qsub`ed jobs? > > -- Reuti > > >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) >> >> >> >> -----Original Message----- >> From: Reuti [mailto:[email protected]] >> Sent: Monday, May 15, 2017 5:46 >> To: John_Tai >> Cc: [email protected] >> Subject: Re: [gridengine users] sge_shepherd using 100% CPU >> >> Hi, >> >>> Am 15.05.2017 um 05:28 schrieb John_Tai <[email protected]>: >>> >>> I recently found a weird problem with qrsh. >>> >>> If I just use it to login to an exec host, the sge_shepherd uses 100% of >>> CPU. >>> >>> # qrsh -q lc.q@ibm105 >>> # top >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 22443 sge 25 0 20396 1604 1272 R 99.5 0.0 0:08.80 sge_shepherd >>> 19927 sge 16 0 114m 3096 1836 S 0.0 0.0 0:00.26 sge_execd >> >> And the opened shell is idling? >> >> How do you log in by this method – the default "builtin" method or anything >> self defined? >> >> In my clusters I can't observe this behavior. >> >> Even if there would be something running in any of the shell's profile: it >> should show up for the opened shell but not for the sge_shepherd which runs >> under the sge admin account. >> >> Any global or queue prolog in place, which is supposed to run under the sge >> account? >> >> == >> >> From the root account you can `strace -p 22443` and check what is going on >> therein. >> >> -- Reuti >> >> >>> But if I submit an actual command with qrsh this doesn’t happen. >>> >>> # qrsh -q lc.q@ibm105 xclock >>> # top >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 19927 sge 16 0 114m 3100 1836 S 0.0 0.0 0:00.38 sge_execd >>> 22671 sge 18 0 20392 1584 1256 S 0.0 0.0 0:00.00 sge_shepherd >>> >>> Not sure why that is. How do I troubleshoot this? >>> >>> Thanks >>> Johnt >>> This email (including its attachments, if any) may be confidential and >>> proprietary information of SMIC, and intended only for the use of the named >>> recipient(s) above. Any unauthorized use or disclosure of this email is >>> strictly prohibited. If you are not the intended recipient(s), please >>> notify the sender immediately and delete this email from your computer. >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> ________________________________ >> >> This email (including its attachments, if any) may be confidential and >> proprietary information of SMIC, and intended only for the use of the named >> recipient(s) above. Any unauthorized use or disclosure of this email is >> strictly prohibited. If you are not the intended recipient(s), please notify >> the sender immediately and delete this email from your computer. >> > > ________________________________ > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > ________________________________ This email (including its attachments, if any) may be confidential and proprietary information of SMIC, and intended only for the use of the named recipient(s) above. Any unauthorized use or disclosure of this email is strictly prohibited. If you are not the intended recipient(s), please notify the sender immediately and delete this email from your computer. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
