Hi,
Do you guys now what this error could be:
error reason 2: 11/22/2012 11:12:25 [0:31220]:
execvlp(/var/spool/gridengine/execd/node0/job_scripts/69, "/var/spool
error reason 3: 11/22/2012 11:12:25 [0:31221]:
execvlp(/var/spool/gridengine/execd/node0/job_scripts/69, "/var/spool
this goes on as long as iets running... and my state went to:
69 0.50000 SA root Eqw 11/22/2012 09:12:05 1
1-500:1
69 0.00000 SA root qw 11/22/2012 09:12:05 1
501-4200:1
This is the script i was running:
#!/bin/bash
#$-cwd
#$-N SA
#$-t 1-4200:1
/var/software/packages/Mathematica/7.0/Executables/math -run
"teller=$SGE_TASK_ID;<< ModelCaCO31.m"
Hope somebody can fix the problem.
Kind Regards
2012/11/14 Reuti <[email protected]>
> Am 14.11.2012 um 10:08 schrieb jan roels:
>
> > I got it working again, there was already a proces of execd running that
> needed to be killed and then restart the services.
> >
> > I'm trying to run a script now:
> >
> >
> > #!/bin/bash
> > #$-cwd
> > #$-N SA
> > #$-S /bin/sh
> > #$-t 1-4200:
>
> Don't run scripts at root. If something goes wring it might trash your
> machine(s).
>
>
> > /var/software/packages/Mathematica/7.0/Executables/math -run
> "teller=$SGE_TASK_ID;<< ModelCaCO31.m"
> >
> > but it gives the following output:
> >
> > stdin: is not a tty
>
> It's just a warning - unless someone complains I would suggest to ignore
> it.
>
>
> > and this is the output of my qstat -f:
> >
> > queuename qtype resv/used/tot. load_avg arch
> states
> >
> ---------------------------------------------------------------------------------
> > [email protected] BIP 0/1/1 0.70 lx26-amd64
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 1
> >
> ---------------------------------------------------------------------------------
> > main.q@node0 BIP 0/24/24 27.71 lx26-amd64
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 2
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 3
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 4
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 5
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 6
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 7
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 8
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1 9
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 10
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 11
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 12
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 13
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 14
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 15
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 16
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 17
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 18
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 19
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 20
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 21
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 22
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 23
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 24
> > 35 0.50000 SA root r 11/14/2012 09:57:47 1
> 25
> >
> >
> ############################################################################
> > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
> JOBS
> >
> ############################################################################
> > 35 0.50000 SA root qw 11/14/2012 09:57:38 1
> 26-4200:1
> >
> >
> > root@camilla:/nfs/share/sge# qstat -explain c -j 35
> > ==============================================================
> > job_number: 35
> > exec_file: job_scripts/35
> > submission_time: Wed Nov 14 09:57:38 2012
> > owner: root
> > uid: 0
> > group: root
> > gid: 0
> > sge_o_home: /root
> > sge_o_log_name: root
> > sge_o_path:
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
> > sge_o_shell: /bin/bash
> > sge_o_workdir: /nfs/share/sge
> > sge_o_host: camilla
> > account: sge
> > cwd: /nfs/share/sge
> > mail_list: root@camilla
> > notify: FALSE
> > job_name: SA
> > jobshare: 0
> > shell_list: NONE:/bin/sh
> > env_list:
> > script_file: HistDisCaCO31.sh
> > job-array tasks: 1-4200:1
> > usage 1: cpu=00:05:20, mem=105.16135 GBs, io=0.01537,
> vmem=1.110G, maxvmem=1.110G
> > usage 2: cpu=00:04:17, mem=179.44371 GBs, io=0.01395,
> vmem=3.643G, maxvmem=3.643G
> > usage 3: cpu=00:04:37, mem=191.69532 GBs, io=0.01394,
> vmem=3.657G, maxvmem=3.657G
> > usage 4: cpu=00:04:34, mem=188.12645 GBs, io=0.01394,
> vmem=3.655G, maxvmem=3.655G
> > usage 5: cpu=00:04:16, mem=180.18292 GBs, io=0.01394,
> vmem=3.636G, maxvmem=3.636G
> > usage 6: cpu=00:04:22, mem=183.47616 GBs, io=0.01394,
> vmem=3.644G, maxvmem=3.644G
> > usage 7: cpu=00:04:15, mem=179.89624 GBs, io=0.01400,
> vmem=3.640G, maxvmem=3.640G
> > usage 8: cpu=00:04:55, mem=207.28643 GBs, io=0.01394,
> vmem=3.669G, maxvmem=3.669G
> > usage 9: cpu=00:04:27, mem=184.86707 GBs, io=0.01394,
> vmem=3.653G, maxvmem=3.653G
> > usage 10: cpu=00:04:14, mem=179.09446 GBs, io=0.01394,
> vmem=3.635G, maxvmem=3.635G
> > usage 11: cpu=00:04:47, mem=195.80372 GBs, io=0.01400,
> vmem=3.668G, maxvmem=3.668G
> > usage 12: cpu=00:04:49, mem=203.43895 GBs, io=0.01394,
> vmem=3.665G, maxvmem=3.665G
> > usage 13: cpu=00:04:45, mem=196.67175 GBs, io=0.01394,
> vmem=3.663G, maxvmem=3.663G
> > usage 14: cpu=00:04:24, mem=185.68047 GBs, io=0.01400,
> vmem=3.648G, maxvmem=3.648G
> > usage 15: cpu=00:04:40, mem=195.96253 GBs, io=0.01394,
> vmem=3.656G, maxvmem=3.656G
> > usage 16: cpu=00:04:11, mem=179.84016 GBs, io=0.01394,
> vmem=3.633G, maxvmem=3.633G
> > usage 17: cpu=00:04:43, mem=196.21689 GBs, io=0.01394,
> vmem=3.662G, maxvmem=3.662G
> > usage 18: cpu=00:04:37, mem=197.39875 GBs, io=0.01394,
> vmem=3.653G, maxvmem=3.653G
> > usage 19: cpu=00:04:35, mem=191.55982 GBs, io=0.01394,
> vmem=3.653G, maxvmem=3.653G
> > usage 20: cpu=00:04:26, mem=191.62928 GBs, io=0.01394,
> vmem=3.643G, maxvmem=3.643G
> > usage 21: cpu=00:04:42, mem=197.87398 GBs, io=0.01394,
> vmem=3.660G, maxvmem=3.660G
> > usage 22: cpu=00:04:36, mem=193.43107 GBs, io=0.01394,
> vmem=3.652G, maxvmem=3.652G
> > usage 23: cpu=00:04:32, mem=193.12103 GBs, io=0.01394,
> vmem=3.652G, maxvmem=3.652G
> > usage 24: cpu=00:04:25, mem=186.56485 GBs, io=0.01400,
> vmem=3.644G, maxvmem=3.644G
> > usage 25: cpu=00:04:51, mem=201.81706 GBs, io=0.01400,
> vmem=3.669G, maxvmem=3.669G
> > scheduling info: queue instance "main.q@camilla" dropped
> because it is full
> > queue instance "main.q@node0" dropped
> because it is full
> > All queues dropped because of overload or
> full
> > not all array task may be started due to
> 'max_aj_instances'
>
> The machine is just full.
>
> -- Reuti
>
>
> > You guys know how this can be solved?
> >
> >
> >
> > 2012/11/13 Reuti <[email protected]>
> > Am 13.11.2012 um 13:42 schrieb jan roels:
> >
> > > Hi,
> > >
> > > I followed the following tutorial:
> > >
> > >
> http://verahill.blogspot.be/2012/06/setting-up-sun-grid-engine-with-three.htmlon
> how to install the SGE. It all went fine on my masternode but on my exec
> node i have some troubles.
> > >
> > > First it gave the following error:
> > >
> > > 11/13/2012 13:44:43| main|node0|E|communication error for
> "node0/execd/1" running on port 6445: "can't bind socket"
> >
> > Is there already something running on this port - any older version of
> the execd?
> >
> >
> > > 11/13/2012 13:44:44| main|node0|E|commlib error: can't bind socket
> (no additional information available)
> > > 11/13/2012 13:45:12| main|node0|C|abort qmaster registration due to
> communication errors
> > > 11/13/2012 13:45:14| main|node0|W|daemonize error: child exited
> before sending daemonize state
> > >
> > > but then i killed the proces and restarted the gridengine-execd but
> then i get the following:
> > >
> > > /etc/init.d/gridengine-exec restart
> > > * Restarting Sun Grid Engine Execution Daemon sge_execd
> error: can't resolve host name
> > > error: can't get configuration from qmaster -- backgrounding
> > >
> > > What can i do to fix this?
> >
> > Any firewall on the machines? Ports 6444 and 6445 need to be excluded.
> >
> > -- Reuti
> >
> > > _______________________________________________
> > > users mailing list
> > > [email protected]
> > > https://gridengine.org/mailman/listinfo/users
> >
> >
>
>
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users