I got it working again, there was already a proces of execd running that
needed to be killed and then restart the services.

I'm trying to run a script now:


#!/bin/bash
#$-cwd
#$-N SA
#$-S /bin/sh
#$-t 1-4200:1

/var/software/packages/Mathematica/7.0/Executables/math -run
"teller=$SGE_TASK_ID;<< ModelCaCO31.m"

but it gives the following output:

stdin: is not a tty

and this is the output of my qstat -f:

queuename                      qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
[email protected]        BIP   0/1/1          0.70     lx26-amd64
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 1
---------------------------------------------------------------------------------
main.q@node0                   BIP   0/24/24        27.71    lx26-amd64
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 2
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 3
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 4
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 5
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 6
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 7
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 8
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 9
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 10
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 11
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 12
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 13
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 14
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 15
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 16
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 17
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 18
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 19
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 20
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 21
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 22
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 23
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 24
     35 0.50000 SA         root         r     11/14/2012 09:57:47     1 25

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS
############################################################################
     35 0.50000 SA         root         qw    11/14/2012 09:57:38     1
26-4200:1


root@camilla:/nfs/share/sge#  qstat -explain c -j 35
==============================================================
job_number:                 35
exec_file:                  job_scripts/35
submission_time:            Wed Nov 14 09:57:38 2012
owner:                      root
uid:                        0
group:                      root
gid:                        0
sge_o_home:                 /root
sge_o_log_name:             root
sge_o_path:
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sge_o_shell:                /bin/bash
sge_o_workdir:              /nfs/share/sge
sge_o_host:                 camilla
account:                    sge
cwd:                        /nfs/share/sge
mail_list:                  root@camilla
notify:                     FALSE
job_name:                   SA
jobshare:                   0
shell_list:                 NONE:/bin/sh
env_list:
script_file:                HistDisCaCO31.sh
job-array tasks:            1-4200:1
usage    1:                 cpu=00:05:20, mem=105.16135 GBs, io=0.01537,
vmem=1.110G, maxvmem=1.110G
usage    2:                 cpu=00:04:17, mem=179.44371 GBs, io=0.01395,
vmem=3.643G, maxvmem=3.643G
usage    3:                 cpu=00:04:37, mem=191.69532 GBs, io=0.01394,
vmem=3.657G, maxvmem=3.657G
usage    4:                 cpu=00:04:34, mem=188.12645 GBs, io=0.01394,
vmem=3.655G, maxvmem=3.655G
usage    5:                 cpu=00:04:16, mem=180.18292 GBs, io=0.01394,
vmem=3.636G, maxvmem=3.636G
usage    6:                 cpu=00:04:22, mem=183.47616 GBs, io=0.01394,
vmem=3.644G, maxvmem=3.644G
usage    7:                 cpu=00:04:15, mem=179.89624 GBs, io=0.01400,
vmem=3.640G, maxvmem=3.640G
usage    8:                 cpu=00:04:55, mem=207.28643 GBs, io=0.01394,
vmem=3.669G, maxvmem=3.669G
usage    9:                 cpu=00:04:27, mem=184.86707 GBs, io=0.01394,
vmem=3.653G, maxvmem=3.653G
usage   10:                 cpu=00:04:14, mem=179.09446 GBs, io=0.01394,
vmem=3.635G, maxvmem=3.635G
usage   11:                 cpu=00:04:47, mem=195.80372 GBs, io=0.01400,
vmem=3.668G, maxvmem=3.668G
usage   12:                 cpu=00:04:49, mem=203.43895 GBs, io=0.01394,
vmem=3.665G, maxvmem=3.665G
usage   13:                 cpu=00:04:45, mem=196.67175 GBs, io=0.01394,
vmem=3.663G, maxvmem=3.663G
usage   14:                 cpu=00:04:24, mem=185.68047 GBs, io=0.01400,
vmem=3.648G, maxvmem=3.648G
usage   15:                 cpu=00:04:40, mem=195.96253 GBs, io=0.01394,
vmem=3.656G, maxvmem=3.656G
usage   16:                 cpu=00:04:11, mem=179.84016 GBs, io=0.01394,
vmem=3.633G, maxvmem=3.633G
usage   17:                 cpu=00:04:43, mem=196.21689 GBs, io=0.01394,
vmem=3.662G, maxvmem=3.662G
usage   18:                 cpu=00:04:37, mem=197.39875 GBs, io=0.01394,
vmem=3.653G, maxvmem=3.653G
usage   19:                 cpu=00:04:35, mem=191.55982 GBs, io=0.01394,
vmem=3.653G, maxvmem=3.653G
usage   20:                 cpu=00:04:26, mem=191.62928 GBs, io=0.01394,
vmem=3.643G, maxvmem=3.643G
usage   21:                 cpu=00:04:42, mem=197.87398 GBs, io=0.01394,
vmem=3.660G, maxvmem=3.660G
usage   22:                 cpu=00:04:36, mem=193.43107 GBs, io=0.01394,
vmem=3.652G, maxvmem=3.652G
usage   23:                 cpu=00:04:32, mem=193.12103 GBs, io=0.01394,
vmem=3.652G, maxvmem=3.652G
usage   24:                 cpu=00:04:25, mem=186.56485 GBs, io=0.01400,
vmem=3.644G, maxvmem=3.644G
usage   25:                 cpu=00:04:51, mem=201.81706 GBs, io=0.01400,
vmem=3.669G, maxvmem=3.669G
scheduling info:            queue instance "main.q@camilla" dropped because
it is full
                            queue instance "main.q@node0" dropped because
it is full
                            All queues dropped because of overload or full
                            not all array task may be started due to
'max_aj_instances'

You guys know how this can be solved?



2012/11/13 Reuti <[email protected]>

> Am 13.11.2012 um 13:42 schrieb jan roels:
>
> > Hi,
> >
> > I followed the following tutorial:
> >
> >
> http://verahill.blogspot.be/2012/06/setting-up-sun-grid-engine-with-three.htmlon
>  how to install the SGE. It all went fine on my masternode but on my exec
> node i have some troubles.
> >
> > First it gave the following error:
> >
> > 11/13/2012 13:44:43|  main|node0|E|communication error for
> "node0/execd/1" running on port 6445: "can't bind socket"
>
> Is there already something running on this port - any older version of the
> execd?
>
>
> > 11/13/2012 13:44:44|  main|node0|E|commlib error: can't bind socket (no
> additional information available)
> > 11/13/2012 13:45:12|  main|node0|C|abort qmaster registration due to
> communication errors
> > 11/13/2012 13:45:14|  main|node0|W|daemonize error: child exited before
> sending daemonize state
> >
> > but then i killed the proces and restarted the gridengine-execd but then
> i get the following:
> >
> > /etc/init.d/gridengine-exec restart
> > * Restarting Sun Grid Engine Execution Daemon sge_execd
>                                    error: can't resolve host name
> > error: can't get configuration from qmaster -- backgrounding
> >
> > What can i do to fix this?
>
> Any firewall on the machines? Ports 6444 and 6445 need to be excluded.
>
> -- Reuti
>
> > _______________________________________________
> > users mailing list
> > [email protected]
> > https://gridengine.org/mailman/listinfo/users
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to