Dear all,

I try to run a simple Makefile with qmake (SGE 8.1.9) but it fails
everytime after the first round of commands with the following error:
------------------------------------------
$ qmake -cwd -v PATH -pe make 1 -verbose --
[...]
reading exit code from shepherd ... timeout (60 s) expired while waiting
on socket fd 4
error: error reading returncode of remote command
cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR
------------------------------------------
------------------------------------------
$ cat Makefile
all: a.y b.y c.y d.y e.y f.y

%.y: %.x
    touch $@; sleep 3
------------------------------------------

Overall, only a.y is created. If I use N slots (with -pe make 1-N), only
N files are created. It seems to me that qmake gets stuck because it
fails to close opened connections. Note that I have the same problem
when I do not use the -pe option, or when I try to run qmake -inherit in
a qsub script. Apart from that, qsub and qlogin work fine.

I think I narrowed the problem to be related to qrsh, as I have a
similar error with this command:
------------------------------------------
$ qrsh -cwd -v PATH -verbose hostname
Your job 121477 ("hostname") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 121477 has been successfully scheduled.
Establishing /usr/bin/ssh -o LogLevel=ERROR session to host XXX.prive ...
XXX.prive
/usr/bin/ssh -o LogLevel=ERROR exited with exit code 0
reading exit code from shepherd ... timeout (60 s) expired while waiting
on socket fd 4
error: error reading returncode of remote command
cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR
------------------------------------------

Any idea what might cause this problem? You can find below the complete
output.

Sincerely,
−Nils

------------------------------------------
Complete output:
$ qstat -help
SGE 8.1.9
$ qmake -cwd -verbose -v PATH -pe make 1 --
dynamic task allocation mode
sge_argv[0] = qmake
sge_argv[1] = -cwd
sge_argv[2] = -verbose
sge_argv[3] = -v
sge_argv[4] = PATH
sge_argv[5] = -pe
sge_argv[6] = make
sge_argv[7] = 1
gmake_argv[0]  = qmake
determine qmake startmode
setting default options: -l arch=lx-amd64
creating scheduled qmake
argv[  0] = qrsh
argv[  1] = -noshell
argv[  2] = -cwd
argv[  3] = -verbose
argv[  4] = -v
argv[  5] = PATH
argv[  6] = -pe
argv[  7] = make
argv[  8] = 1
argv[  9] = -l
argv[ 10] = arch=lx-amd64
argv[ 11] = qmake
argv[ 12] = -inherit
argv[ 13] = -verbose
argv[ 14] = -cwd
argv[ 15] = -v
argv[ 16] = PATH
argv[ 17] = -l
argv[ 18] = arch=lx-amd64
argv[ 19] = --
Your job 121548 ("qmake") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 121548 has been successfully scheduled.
Establishing /usr/bin/ssh -o LogLevel=ERROR session to host
gknzwd2.XXX.prive ...
sge_argv[0] = qmake
sge_argv[1] = -inherit
sge_argv[2] = -verbose
sge_argv[3] = -cwd
sge_argv[4] = -v
sge_argv[5] = PATH
sge_argv[6] = -l
sge_argv[7] = arch=lx-amd64
gmake_argv[0]  = qmake
determine qmake startmode
inserting -j option from NSLOTS environment: -j 1
sge hostfile =
/opt/sge/BiRD_v2/spool/gknzwd2/active_jobs/121548.1/pe_hostfile
qmake  hostfile = /tmp/121548.1.max-24h.q/qmake_hostfile
qmake  lockfile = /tmp/121548.1.max-24h.q/qmake_lockfile
creating qmake hostfile
number of slots for qmake execution is 1
enabling next task to be executed as Grid Engine parallel task
touch a.y; sleep 3
export the following environment variables:
SGE_RSH_COMMAND,BASH_FUNC_module(),MAKEFLAGS,MFLAGS,MAKELEVEL
obtained lock to qmake lockfile
clearing lock to hostfile
next host for qmake job is gknzwd2.XXX.prive
gknzwd2.XXX.prive
gmake requesting status of dead child processes
gmake requesting status of dead child processes
waiting for child failed: timeout
starting job:
args[  0] = qrsh
args[  1] = -noshell
args[  2] = -verbose
args[  3] = -inherit
args[  4] = -cwd
args[  5] = -v
args[  6] = SGE_RSH_COMMAND,BASH_FUNC_module(),MAKEFLAGS,MFLAGS,MAKELEVEL
args[  7] = -v
args[  8] = PATH
args[  9] = gknzwd2.XXX.prive
args[ 10] = /bin/sh
args[ 11] = -c
args[ 12] = touch a.y; sleep 3
Starting server daemon at host "gknzwd2.XXX.prive"
Server daemon successfully started with task id "1.gknzwd2"
Establishing /usr/bin/ssh -o LogLevel=ERROR session to host
gknzwd2.XXX.prive ...
/usr/bin/ssh -o LogLevel=ERROR exited with exit code 0
reading exit code from shepherd ... timeout (60 s) expired while waiting
on socket fd 4
error: error reading returncode of remote command
obtained lock to qmake lockfile
unlock_hostentry 0
clearing lock to hostfile
qmake: *** [a.y] Error 255
cleanup of remote mechanism
/usr/bin/ssh -o LogLevel=ERROR exited with exit code 0
reading exit code from shepherd ... timeout (60 s) expired while waiting
on socket fd 4
error: error reading returncode of remote command
cleaning up after abnormal exit of /usr/bin/ssh -o LogLevel=ERROR



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to