Am 14.11.2012 um 10:08 schrieb jan roels: > I got it working again, there was already a proces of execd running that > needed to be killed and then restart the services. > > I'm trying to run a script now: > > > #!/bin/bash > #$-cwd > #$-N SA > #$-S /bin/sh > #$-t 1-4200:
Don't run scripts at root. If something goes wring it might trash your machine(s). > /var/software/packages/Mathematica/7.0/Executables/math -run > "teller=$SGE_TASK_ID;<< ModelCaCO31.m" > > but it gives the following output: > > stdin: is not a tty It's just a warning - unless someone complains I would suggest to ignore it. > and this is the output of my qstat -f: > > queuename qtype resv/used/tot. load_avg arch > states > --------------------------------------------------------------------------------- > [email protected] BIP 0/1/1 0.70 lx26-amd64 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 1 > --------------------------------------------------------------------------------- > main.q@node0 BIP 0/24/24 27.71 lx26-amd64 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 2 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 3 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 4 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 5 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 6 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 7 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 8 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 9 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 10 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 11 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 12 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 13 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 14 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 15 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 16 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 17 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 18 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 19 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 20 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 21 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 22 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 23 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 24 > 35 0.50000 SA root r 11/14/2012 09:57:47 1 25 > > ############################################################################ > - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS > ############################################################################ > 35 0.50000 SA root qw 11/14/2012 09:57:38 1 > 26-4200:1 > > > root@camilla:/nfs/share/sge# qstat -explain c -j 35 > ============================================================== > job_number: 35 > exec_file: job_scripts/35 > submission_time: Wed Nov 14 09:57:38 2012 > owner: root > uid: 0 > group: root > gid: 0 > sge_o_home: /root > sge_o_log_name: root > sge_o_path: > /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin > sge_o_shell: /bin/bash > sge_o_workdir: /nfs/share/sge > sge_o_host: camilla > account: sge > cwd: /nfs/share/sge > mail_list: root@camilla > notify: FALSE > job_name: SA > jobshare: 0 > shell_list: NONE:/bin/sh > env_list: > script_file: HistDisCaCO31.sh > job-array tasks: 1-4200:1 > usage 1: cpu=00:05:20, mem=105.16135 GBs, io=0.01537, > vmem=1.110G, maxvmem=1.110G > usage 2: cpu=00:04:17, mem=179.44371 GBs, io=0.01395, > vmem=3.643G, maxvmem=3.643G > usage 3: cpu=00:04:37, mem=191.69532 GBs, io=0.01394, > vmem=3.657G, maxvmem=3.657G > usage 4: cpu=00:04:34, mem=188.12645 GBs, io=0.01394, > vmem=3.655G, maxvmem=3.655G > usage 5: cpu=00:04:16, mem=180.18292 GBs, io=0.01394, > vmem=3.636G, maxvmem=3.636G > usage 6: cpu=00:04:22, mem=183.47616 GBs, io=0.01394, > vmem=3.644G, maxvmem=3.644G > usage 7: cpu=00:04:15, mem=179.89624 GBs, io=0.01400, > vmem=3.640G, maxvmem=3.640G > usage 8: cpu=00:04:55, mem=207.28643 GBs, io=0.01394, > vmem=3.669G, maxvmem=3.669G > usage 9: cpu=00:04:27, mem=184.86707 GBs, io=0.01394, > vmem=3.653G, maxvmem=3.653G > usage 10: cpu=00:04:14, mem=179.09446 GBs, io=0.01394, > vmem=3.635G, maxvmem=3.635G > usage 11: cpu=00:04:47, mem=195.80372 GBs, io=0.01400, > vmem=3.668G, maxvmem=3.668G > usage 12: cpu=00:04:49, mem=203.43895 GBs, io=0.01394, > vmem=3.665G, maxvmem=3.665G > usage 13: cpu=00:04:45, mem=196.67175 GBs, io=0.01394, > vmem=3.663G, maxvmem=3.663G > usage 14: cpu=00:04:24, mem=185.68047 GBs, io=0.01400, > vmem=3.648G, maxvmem=3.648G > usage 15: cpu=00:04:40, mem=195.96253 GBs, io=0.01394, > vmem=3.656G, maxvmem=3.656G > usage 16: cpu=00:04:11, mem=179.84016 GBs, io=0.01394, > vmem=3.633G, maxvmem=3.633G > usage 17: cpu=00:04:43, mem=196.21689 GBs, io=0.01394, > vmem=3.662G, maxvmem=3.662G > usage 18: cpu=00:04:37, mem=197.39875 GBs, io=0.01394, > vmem=3.653G, maxvmem=3.653G > usage 19: cpu=00:04:35, mem=191.55982 GBs, io=0.01394, > vmem=3.653G, maxvmem=3.653G > usage 20: cpu=00:04:26, mem=191.62928 GBs, io=0.01394, > vmem=3.643G, maxvmem=3.643G > usage 21: cpu=00:04:42, mem=197.87398 GBs, io=0.01394, > vmem=3.660G, maxvmem=3.660G > usage 22: cpu=00:04:36, mem=193.43107 GBs, io=0.01394, > vmem=3.652G, maxvmem=3.652G > usage 23: cpu=00:04:32, mem=193.12103 GBs, io=0.01394, > vmem=3.652G, maxvmem=3.652G > usage 24: cpu=00:04:25, mem=186.56485 GBs, io=0.01400, > vmem=3.644G, maxvmem=3.644G > usage 25: cpu=00:04:51, mem=201.81706 GBs, io=0.01400, > vmem=3.669G, maxvmem=3.669G > scheduling info: queue instance "main.q@camilla" dropped because > it is full > queue instance "main.q@node0" dropped because it > is full > All queues dropped because of overload or full > not all array task may be started due to > 'max_aj_instances' The machine is just full. -- Reuti > You guys know how this can be solved? > > > > 2012/11/13 Reuti <[email protected]> > Am 13.11.2012 um 13:42 schrieb jan roels: > > > Hi, > > > > I followed the following tutorial: > > > > http://verahill.blogspot.be/2012/06/setting-up-sun-grid-engine-with-three.html > > on how to install the SGE. It all went fine on my masternode but on my > > exec node i have some troubles. > > > > First it gave the following error: > > > > 11/13/2012 13:44:43| main|node0|E|communication error for "node0/execd/1" > > running on port 6445: "can't bind socket" > > Is there already something running on this port - any older version of the > execd? > > > > 11/13/2012 13:44:44| main|node0|E|commlib error: can't bind socket (no > > additional information available) > > 11/13/2012 13:45:12| main|node0|C|abort qmaster registration due to > > communication errors > > 11/13/2012 13:45:14| main|node0|W|daemonize error: child exited before > > sending daemonize state > > > > but then i killed the proces and restarted the gridengine-execd but then i > > get the following: > > > > /etc/init.d/gridengine-exec restart > > * Restarting Sun Grid Engine Execution Daemon sge_execd > > error: can't resolve host name > > error: can't get configuration from qmaster -- backgrounding > > > > What can i do to fix this? > > Any firewall on the machines? Ports 6444 and 6445 need to be excluded. > > -- Reuti > > > _______________________________________________ > > users mailing list > > [email protected] > > https://gridengine.org/mailman/listinfo/users > > _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
