> Am 22.05.2017 um 03:02 schrieb John_Tai <[email protected]>: > > This only happens for qrsh without a command. Qsub is not affected since it > does have a command. I am still baffled by this, don't know how to > troubleshoot this.
Does it also happen, if the submitted command with `qrsh` is something like a `sleep 120`? -- Reuti > Thanks > John > > > > -----Original Message----- > From: Reuti [mailto:[email protected]] > Sent: Saturday, May 20, 2017 12:53 > To: John_Tai > Cc: [email protected] > Subject: Re: [gridengine users] sge_shepherd using 100% CPU > > Hi, > >> Am 16.05.2017 um 03:44 schrieb John_Tai <[email protected]>: >> >>>> And the opened shell is idling? >> >> No, it's working normally. >> >>>> How do you log in by this method – the default "builtin" method or >>>> anything self defined? >> >> Default, I didn't define anything. Don't even know how to. >> >>>> Any global or queue prolog in place, which is supposed to run under the >>>> sge account? >> >> There are no prolog/epilog defined. >> >>>> From the root account you can `strace -p 22443` and check what is going on >>>> therein. >> >> It keeps looping through these messages. Is it normal? > > No. > > And this happens for the all logins by `qrsh` on all nodes, but not for > conventional `qsub`ed jobs? > > -- Reuti > > >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) wait4(-1, 0x7fffa369d95c, WNOHANG, 0x7fffa369e300) = 0 >> alarm(0) = 0 >> alarm(0) = 0 >> poll([{fd=6, events=POLLIN}, {fd=-1}], 2, 1000) = 1 ([{fd=6, >> revents=POLLHUP}]) >> >> >> >> -----Original Message----- >> From: Reuti [mailto:[email protected]] >> Sent: Monday, May 15, 2017 5:46 >> To: John_Tai >> Cc: [email protected] >> Subject: Re: [gridengine users] sge_shepherd using 100% CPU >> >> Hi, >> >>> Am 15.05.2017 um 05:28 schrieb John_Tai <[email protected]>: >>> >>> I recently found a weird problem with qrsh. >>> >>> If I just use it to login to an exec host, the sge_shepherd uses 100% of >>> CPU. >>> >>> # qrsh -q lc.q@ibm105 >>> # top >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 22443 sge 25 0 20396 1604 1272 R 99.5 0.0 0:08.80 sge_shepherd >>> 19927 sge 16 0 114m 3096 1836 S 0.0 0.0 0:00.26 sge_execd >> >> And the opened shell is idling? >> >> How do you log in by this method – the default "builtin" method or anything >> self defined? >> >> In my clusters I can't observe this behavior. >> >> Even if there would be something running in any of the shell's profile: it >> should show up for the opened shell but not for the sge_shepherd which runs >> under the sge admin account. >> >> Any global or queue prolog in place, which is supposed to run under the sge >> account? >> >> == >> >> From the root account you can `strace -p 22443` and check what is going on >> therein. >> >> -- Reuti >> >> >>> But if I submit an actual command with qrsh this doesn’t happen. >>> >>> # qrsh -q lc.q@ibm105 xclock >>> # top >>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >>> 19927 sge 16 0 114m 3100 1836 S 0.0 0.0 0:00.38 sge_execd >>> 22671 sge 18 0 20392 1584 1256 S 0.0 0.0 0:00.00 sge_shepherd >>> >>> Not sure why that is. How do I troubleshoot this? >>> >>> Thanks >>> Johnt >>> This email (including its attachments, if any) may be confidential and >>> proprietary information of SMIC, and intended only for the use of the named >>> recipient(s) above. Any unauthorized use or disclosure of this email is >>> strictly prohibited. If you are not the intended recipient(s), please >>> notify the sender immediately and delete this email from your computer. >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >> >> ________________________________ >> >> This email (including its attachments, if any) may be confidential and >> proprietary information of SMIC, and intended only for the use of the named >> recipient(s) above. Any unauthorized use or disclosure of this email is >> strictly prohibited. If you are not the intended recipient(s), please notify >> the sender immediately and delete this email from your computer. >> > > ________________________________ > > This email (including its attachments, if any) may be confidential and > proprietary information of SMIC, and intended only for the use of the named > recipient(s) above. Any unauthorized use or disclosure of this email is > strictly prohibited. If you are not the intended recipient(s), please notify > the sender immediately and delete this email from your computer. > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
