Hi Nils:

> Am 27.02.2018 um 10:11 schrieb Nils Giordano <[email protected]>:
> 
> Dear Reuti,
> 
> Thank you for your answer. I should have specified that I am not an
> admin on this cluster so I do not have access to the `qconf` command,

There is no restriction in using `qconf -sconf` to show the values only. 
Usually the binary is in the same directory as `qsub` in something like 
$SGE_ROOT/bin/lx-amd64


> however you were right: `ssh` is definitively used to access nodes
> (probably on purpose since we have access to several GUI apps). Your
> answer made me check my ~/.ssh/ directory, and I found dozens of
> *.socket files in there.
> 
> After removing these files, qmake and qrsh perform flawlessly (shepherd
> exit code 0).

Great.


> I still do not know what caused this problem and at which
> point these files were created, but I will know what to look for would
> this problem reappear.

For me I never saw any socket files created in my ~/.ssh Maybe it's custom with 
your other graphical apps.

-- Reuti


> Thank you very much for your help.
> 
> Sincerely,
> − Nils
> 
> On 26/02/2018 17:00, Reuti wrote:
>> Hi,
>> 
>> it looks like the connection to nodes is set to `ssh`. Does your output of:
>> 
>> $ qconf -sconf
>> #global:
>> qlogin_command
>> qlogin_daemon
>> rlogin_command
>> rlogin_daemon
>> rsh_command
>> rsh_daemon
>> 
>> reflect this? Do you need `ssh` to access nodes by SGE for X11 forwarding?
>> 
>> -- Reuti
>> 
>> 
>>> Am 26.02.2018 um 15:29 schrieb Nils Giordano <[email protected]>:
>>> 
>>> Dear all,
>>> 
>>> I try to run a simple Makefile with qmake (SGE 8.1.9) but it fails
>>> everytime after the first round of commands with the following error:
>>> ------------------------------------------
>>> $ qmake -cwd -v PATH -pe make 1 -verbose --
>>> [...]
>>> reading exit code from shepherd ... timeout (60 s) expired while waiting
>>> on socket fd 4
>>> error: error reading returncode of remote command
>>> cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR
>>> ------------------------------------------
>>> ------------------------------------------
>>> $ cat Makefile
>>> all: a.y b.y c.y d.y e.y f.y
>>> 
>>> %.y: %.x
>>>    touch $@; sleep 3
>>> ------------------------------------------
>>> 
>>> Overall, only a.y is created. If I use N slots (with -pe make 1-N), only
>>> N files are created. It seems to me that qmake gets stuck because it
>>> fails to close opened connections. Note that I have the same problem
>>> when I do not use the -pe option, or when I try to run qmake -inherit in
>>> a qsub script. Apart from that, qsub and qlogin work fine.
>>> 
>>> I think I narrowed the problem to be related to qrsh, as I have a
>>> similar error with this command:
>>> ------------------------------------------
>>> $ qrsh -cwd -v PATH -verbose hostname
>>> Your job 121477 ("hostname") has been submitted
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 121477 has been successfully scheduled.
>>> Establishing /usr/bin/ssh -o LogLevel=ERROR session to host XXX.prive ...
>>> XXX.prive
>>> /usr/bin/ssh -o LogLevel=ERROR exited with exit code 0
>>> reading exit code from shepherd ... timeout (60 s) expired while waiting
>>> on socket fd 4
>>> error: error reading returncode of remote command
>>> cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR
>>> ------------------------------------------
>>> 
>>> Any idea what might cause this problem? You can find below the complete
>>> output.
>>> 
>>> Sincerely,
>>> −Nils
>>> 
>>> ------------------------------------------
>>> Complete output:
>>> $ qstat -help
>>> SGE 8.1.9
>>> $ qmake -cwd -verbose -v PATH -pe make 1 --
>>> dynamic task allocation mode
>>> sge_argv[0] = qmake
>>> sge_argv[1] = -cwd
>>> sge_argv[2] = -verbose
>>> sge_argv[3] = -v
>>> sge_argv[4] = PATH
>>> sge_argv[5] = -pe
>>> sge_argv[6] = make
>>> sge_argv[7] = 1
>>> gmake_argv[0]  = qmake
>>> determine qmake startmode
>>> setting default options: -l arch=lx-amd64
>>> creating scheduled qmake
>>> argv[  0] = qrsh
>>> argv[  1] = -noshell
>>> argv[  2] = -cwd
>>> argv[  3] = -verbose
>>> argv[  4] = -v
>>> argv[  5] = PATH
>>> argv[  6] = -pe
>>> argv[  7] = make
>>> argv[  8] = 1
>>> argv[  9] = -l
>>> argv[ 10] = arch=lx-amd64
>>> argv[ 11] = qmake
>>> argv[ 12] = -inherit
>>> argv[ 13] = -verbose
>>> argv[ 14] = -cwd
>>> argv[ 15] = -v
>>> argv[ 16] = PATH
>>> argv[ 17] = -l
>>> argv[ 18] = arch=lx-amd64
>>> argv[ 19] = --
>>> Your job 121548 ("qmake") has been submitted
>>> waiting for interactive job to be scheduled ...
>>> Your interactive job 121548 has been successfully scheduled.
>>> Establishing /usr/bin/ssh -o LogLevel=ERROR session to host
>>> gknzwd2.XXX.prive ...
>>> sge_argv[0] = qmake
>>> sge_argv[1] = -inherit
>>> sge_argv[2] = -verbose
>>> sge_argv[3] = -cwd
>>> sge_argv[4] = -v
>>> sge_argv[5] = PATH
>>> sge_argv[6] = -l
>>> sge_argv[7] = arch=lx-amd64
>>> gmake_argv[0]  = qmake
>>> determine qmake startmode
>>> inserting -j option from NSLOTS environment: -j 1
>>> sge hostfile =
>>> /opt/sge/BiRD_v2/spool/gknzwd2/active_jobs/121548.1/pe_hostfile
>>> qmake  hostfile = /tmp/121548.1.max-24h.q/qmake_hostfile
>>> qmake  lockfile = /tmp/121548.1.max-24h.q/qmake_lockfile
>>> creating qmake hostfile
>>> number of slots for qmake execution is 1
>>> enabling next task to be executed as Grid Engine parallel task
>>> touch a.y; sleep 3
>>> export the following environment variables:
>>> SGE_RSH_COMMAND,BASH_FUNC_module(),MAKEFLAGS,MFLAGS,MAKELEVEL
>>> obtained lock to qmake lockfile
>>> clearing lock to hostfile
>>> next host for qmake job is gknzwd2.XXX.prive
>>> gknzwd2.XXX.prive
>>> gmake requesting status of dead child processes
>>> gmake requesting status of dead child processes
>>> waiting for child failed: timeout
>>> starting job:
>>> args[  0] = qrsh
>>> args[  1] = -noshell
>>> args[  2] = -verbose
>>> args[  3] = -inherit
>>> args[  4] = -cwd
>>> args[  5] = -v
>>> args[  6] = SGE_RSH_COMMAND,BASH_FUNC_module(),MAKEFLAGS,MFLAGS,MAKELEVEL
>>> args[  7] = -v
>>> args[  8] = PATH
>>> args[  9] = gknzwd2.XXX.prive
>>> args[ 10] = /bin/sh
>>> args[ 11] = -c
>>> args[ 12] = touch a.y; sleep 3
>>> Starting server daemon at host "gknzwd2.XXX.prive"
>>> Server daemon successfully started with task id "1.gknzwd2"
>>> Establishing /usr/bin/ssh -o LogLevel=ERROR session to host
>>> gknzwd2.XXX.prive ...
>>> /usr/bin/ssh -o LogLevel=ERROR exited with exit code 0
>>> reading exit code from shepherd ... timeout (60 s) expired while waiting
>>> on socket fd 4
>>> error: error reading returncode of remote command
>>> obtained lock to qmake lockfile
>>> unlock_hostentry 0
>>> clearing lock to hostfile
>>> qmake: *** [a.y] Error 255
>>> cleanup of remote mechanism
>>> /usr/bin/ssh -o LogLevel=ERROR exited with exit code 0
>>> reading exit code from shepherd ... timeout (60 s) expired while waiting
>>> on socket fd 4
>>> error: error reading returncode of remote command
>>> cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> [email protected]
>>> https://gridengine.org/mailman/listinfo/users
>>> 
> 
> -- 
> Nils Giordano, PhD
> Postdoc in Computational Biology team (ComBi)
> Laboratoire des Sciences du Numérique de Nantes (LS2N), UMR 6004
> https://www.normalesup.org/~giordano/
> 
> 
> 


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to