Hi Nils: > Am 27.02.2018 um 10:11 schrieb Nils Giordano <[email protected]>: > > Dear Reuti, > > Thank you for your answer. I should have specified that I am not an > admin on this cluster so I do not have access to the `qconf` command,
There is no restriction in using `qconf -sconf` to show the values only. Usually the binary is in the same directory as `qsub` in something like $SGE_ROOT/bin/lx-amd64 > however you were right: `ssh` is definitively used to access nodes > (probably on purpose since we have access to several GUI apps). Your > answer made me check my ~/.ssh/ directory, and I found dozens of > *.socket files in there. > > After removing these files, qmake and qrsh perform flawlessly (shepherd > exit code 0). Great. > I still do not know what caused this problem and at which > point these files were created, but I will know what to look for would > this problem reappear. For me I never saw any socket files created in my ~/.ssh Maybe it's custom with your other graphical apps. -- Reuti > Thank you very much for your help. > > Sincerely, > − Nils > > On 26/02/2018 17:00, Reuti wrote: >> Hi, >> >> it looks like the connection to nodes is set to `ssh`. Does your output of: >> >> $ qconf -sconf >> #global: >> qlogin_command >> qlogin_daemon >> rlogin_command >> rlogin_daemon >> rsh_command >> rsh_daemon >> >> reflect this? Do you need `ssh` to access nodes by SGE for X11 forwarding? >> >> -- Reuti >> >> >>> Am 26.02.2018 um 15:29 schrieb Nils Giordano <[email protected]>: >>> >>> Dear all, >>> >>> I try to run a simple Makefile with qmake (SGE 8.1.9) but it fails >>> everytime after the first round of commands with the following error: >>> ------------------------------------------ >>> $ qmake -cwd -v PATH -pe make 1 -verbose -- >>> [...] >>> reading exit code from shepherd ... timeout (60 s) expired while waiting >>> on socket fd 4 >>> error: error reading returncode of remote command >>> cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR >>> ------------------------------------------ >>> ------------------------------------------ >>> $ cat Makefile >>> all: a.y b.y c.y d.y e.y f.y >>> >>> %.y: %.x >>> touch $@; sleep 3 >>> ------------------------------------------ >>> >>> Overall, only a.y is created. If I use N slots (with -pe make 1-N), only >>> N files are created. It seems to me that qmake gets stuck because it >>> fails to close opened connections. Note that I have the same problem >>> when I do not use the -pe option, or when I try to run qmake -inherit in >>> a qsub script. Apart from that, qsub and qlogin work fine. >>> >>> I think I narrowed the problem to be related to qrsh, as I have a >>> similar error with this command: >>> ------------------------------------------ >>> $ qrsh -cwd -v PATH -verbose hostname >>> Your job 121477 ("hostname") has been submitted >>> waiting for interactive job to be scheduled ... >>> Your interactive job 121477 has been successfully scheduled. >>> Establishing /usr/bin/ssh -o LogLevel=ERROR session to host XXX.prive ... >>> XXX.prive >>> /usr/bin/ssh -o LogLevel=ERROR exited with exit code 0 >>> reading exit code from shepherd ... timeout (60 s) expired while waiting >>> on socket fd 4 >>> error: error reading returncode of remote command >>> cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR >>> ------------------------------------------ >>> >>> Any idea what might cause this problem? You can find below the complete >>> output. >>> >>> Sincerely, >>> −Nils >>> >>> ------------------------------------------ >>> Complete output: >>> $ qstat -help >>> SGE 8.1.9 >>> $ qmake -cwd -verbose -v PATH -pe make 1 -- >>> dynamic task allocation mode >>> sge_argv[0] = qmake >>> sge_argv[1] = -cwd >>> sge_argv[2] = -verbose >>> sge_argv[3] = -v >>> sge_argv[4] = PATH >>> sge_argv[5] = -pe >>> sge_argv[6] = make >>> sge_argv[7] = 1 >>> gmake_argv[0] = qmake >>> determine qmake startmode >>> setting default options: -l arch=lx-amd64 >>> creating scheduled qmake >>> argv[ 0] = qrsh >>> argv[ 1] = -noshell >>> argv[ 2] = -cwd >>> argv[ 3] = -verbose >>> argv[ 4] = -v >>> argv[ 5] = PATH >>> argv[ 6] = -pe >>> argv[ 7] = make >>> argv[ 8] = 1 >>> argv[ 9] = -l >>> argv[ 10] = arch=lx-amd64 >>> argv[ 11] = qmake >>> argv[ 12] = -inherit >>> argv[ 13] = -verbose >>> argv[ 14] = -cwd >>> argv[ 15] = -v >>> argv[ 16] = PATH >>> argv[ 17] = -l >>> argv[ 18] = arch=lx-amd64 >>> argv[ 19] = -- >>> Your job 121548 ("qmake") has been submitted >>> waiting for interactive job to be scheduled ... >>> Your interactive job 121548 has been successfully scheduled. >>> Establishing /usr/bin/ssh -o LogLevel=ERROR session to host >>> gknzwd2.XXX.prive ... >>> sge_argv[0] = qmake >>> sge_argv[1] = -inherit >>> sge_argv[2] = -verbose >>> sge_argv[3] = -cwd >>> sge_argv[4] = -v >>> sge_argv[5] = PATH >>> sge_argv[6] = -l >>> sge_argv[7] = arch=lx-amd64 >>> gmake_argv[0] = qmake >>> determine qmake startmode >>> inserting -j option from NSLOTS environment: -j 1 >>> sge hostfile = >>> /opt/sge/BiRD_v2/spool/gknzwd2/active_jobs/121548.1/pe_hostfile >>> qmake hostfile = /tmp/121548.1.max-24h.q/qmake_hostfile >>> qmake lockfile = /tmp/121548.1.max-24h.q/qmake_lockfile >>> creating qmake hostfile >>> number of slots for qmake execution is 1 >>> enabling next task to be executed as Grid Engine parallel task >>> touch a.y; sleep 3 >>> export the following environment variables: >>> SGE_RSH_COMMAND,BASH_FUNC_module(),MAKEFLAGS,MFLAGS,MAKELEVEL >>> obtained lock to qmake lockfile >>> clearing lock to hostfile >>> next host for qmake job is gknzwd2.XXX.prive >>> gknzwd2.XXX.prive >>> gmake requesting status of dead child processes >>> gmake requesting status of dead child processes >>> waiting for child failed: timeout >>> starting job: >>> args[ 0] = qrsh >>> args[ 1] = -noshell >>> args[ 2] = -verbose >>> args[ 3] = -inherit >>> args[ 4] = -cwd >>> args[ 5] = -v >>> args[ 6] = SGE_RSH_COMMAND,BASH_FUNC_module(),MAKEFLAGS,MFLAGS,MAKELEVEL >>> args[ 7] = -v >>> args[ 8] = PATH >>> args[ 9] = gknzwd2.XXX.prive >>> args[ 10] = /bin/sh >>> args[ 11] = -c >>> args[ 12] = touch a.y; sleep 3 >>> Starting server daemon at host "gknzwd2.XXX.prive" >>> Server daemon successfully started with task id "1.gknzwd2" >>> Establishing /usr/bin/ssh -o LogLevel=ERROR session to host >>> gknzwd2.XXX.prive ... >>> /usr/bin/ssh -o LogLevel=ERROR exited with exit code 0 >>> reading exit code from shepherd ... timeout (60 s) expired while waiting >>> on socket fd 4 >>> error: error reading returncode of remote command >>> obtained lock to qmake lockfile >>> unlock_hostentry 0 >>> clearing lock to hostfile >>> qmake: *** [a.y] Error 255 >>> cleanup of remote mechanism >>> /usr/bin/ssh -o LogLevel=ERROR exited with exit code 0 >>> reading exit code from shepherd ... timeout (60 s) expired while waiting >>> on socket fd 4 >>> error: error reading returncode of remote command >>> cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR >>> >>> >>> >>> _______________________________________________ >>> users mailing list >>> [email protected] >>> https://gridengine.org/mailman/listinfo/users >>> > > -- > Nils Giordano, PhD > Postdoc in Computational Biology team (ComBi) > Laboratoire des Sciences du Numérique de Nantes (LS2N), UMR 6004 > https://www.normalesup.org/~giordano/ > > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
