Reuti,
On Apr 1, 2011, at 5:07 PM, Reuti wrote:
> Am 01.04.2011 um 23:54 schrieb William Deegan:
>
>> Greetings,
>>
>> Here's the line in question
>>
>> 04/01/2011 14:49:13| main|hosta|W|Core binding: Couldn't determine core
>> binding string for config file!
>
> Ignore it. I have it also in
Reuti,
On Apr 1, 2011, at 5:12 PM, Reuti wrote:
> Am 02.04.2011 um 01:41 schrieb William Deegan:
>
>> Greetings,
>>
>> Here's what I did.
>> 1) unpack ge tarballs into /opt/ge on all hosts
>> 2) configure grid master
>> 3) scp /opt/ge/default to all hosts
>> 4) verify ssh works back and forth
Am 02.04.2011 um 01:41 schrieb William Deegan:
> Greetings,
>
> Here's what I did.
> 1) unpack ge tarballs into /opt/ge on all hosts
> 2) configure grid master
> 3) scp /opt/ge/default to all hosts
> 4) verify ssh works back and forth among all hosts as root
Do you need X11 forwarding?
> 5)
Am 01.04.2011 um 23:54 schrieb William Deegan:
> Greetings,
>
> Here's the line in question
>
> 04/01/2011 14:49:13| main|hosta|W|Core binding: Couldn't determine core
> binding string for config file!
Ignore it. I have it also in 6.2u5 in case no core binding was requested, but
binding was
Greetings,
Here's what I did.
1) unpack ge tarballs into /opt/ge on all hosts
2) configure grid master
3) scp /opt/ge/default to all hosts
4) verify ssh works back and forth among all hosts as root
5) run ./start_gui_installer -debug
6) Install all execution hosts
This is shared nothing, so the
Greetings,
New gridengine install with binaries from here:
here's the output:
[deegan@hotan2 ~]$ qlogin -l hostname=host12
Your job 1191 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 1191 has been successfully scheduled.
Establishing builtin s
Nope. I am using a binary built at Univa, but its about the same vintage.
qrsh working kind of ok. haven't tried qsh or qlogin yet.
# Stephen Dennis : Senior Sales Engineer
# Univa Corporation: univa.com/products/grid
Stephen,
Have you been using the current master build from here:
http://bioteam.net/dag/gridengine-courtesy-binaries/?
Anyone else?
qsh I can get working, but qrsh and qlogin fail. (I'll send another email in a
minute with details).
-Bill
On Apr 1, 2011, at 3:23 PM, Stephen Dennis wrote:
> Me
Resolved this. One host had bad entry for its own hostname in /etc/hosts.
-Bill
On Apr 1, 2011, at 11:41 AM, William Deegan wrote:
> Greetings,
>
> I noticed one node (using Chris's ge-8.0.0.alph binaries) wouldn't take:
> qsh -l hostname=this_host
>
> So I stopped the execd via:
> /etc/init.d/
Me too.
It looks like that string is not in the current master though.
Possibly something has been fixed already.
Maybe this from today:
-
commit 8c74d05904d05e214768a0686c8bc2259d8c4e31
Author: Daniel Gruber
Date: Fri Apr 1 15:25:07 2011 +0200
Remvoed unwan
Greetings,
Here's the line in question
04/01/2011 14:49:13| main|hosta|W|Core binding: Couldn't determine core
binding string for config file!
Any idea how to resolve this?
Centos 5.5
Thanks,
Bill
___
users mailing list
users@gridengine.org
https://
Greetings,
I noticed one node (using Chris's ge-8.0.0.alph binaries) wouldn't take:
qsh -l hostname=this_host
So I stopped the execd via:
/etc/init.d/sgeexecd.BLAH stop
Then tried
/etc/init.d/sgeexecd.BLAH start
And it just sits there.
First time I did this I saw:
04/01/2011 10:26:05| main|n
On Thu, 31 Mar 2011, Reuti wrote:
...
Does your problem feel like it's related? Or is this a new issue?
In my case only one of these exclusive jobs was running in the cluster
(some other jobs running), around 18 exclusive jobs waiting (submitted
until the 1st started), and the 19th got "no su
Am 01.04.2011 um 16:57 schrieb lars van der bijl:
> core file size (blocks, -c) 0
>
> file locks (-x) unlimited
Fine.
> I think it might be the machine killing them. because where not putting any
> other limits anywhere. unless it's the application where running.
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 193056
max locked memory (kbytes, -l) 256
max memory size (kbytes, -m) unlim
Add on:
you can check the messages file of the execd on the nodes, whether anything
about the reason was recorded there.
-- Reuti
Am 01.04.2011 um 16:39 schrieb lars van der bijl:
> the problem is that i don't have any such limit's enforced currently on
> submission. the submission to qsub a
Am 01.04.2011 um 16:39 schrieb lars van der bijl:
> the problem is that i don't have any such limit's enforced currently on
> submission. the submission to qsub are hidden from the user so i know there
> not adding them.. the only thing we have is a load/suspended theshold in the
> grid it self
Am 01.04.2011 um 12:54 schrieb lars van der bijl:
> in this case yes.
>
> however on the jobs running on our farm we put no memory limits as of yet.
> just request amount of procs
>
> is the it usual behaviour that if it fails with this code that the subsequent
> dependencies start regardless?
also is there anyway of catching this and raising 100? ones the job is
finished and it's dependencies start it's causing major havok on our system
looking for file that aren’t there.
are there other things the grid uses the SIGKILL for? not just memory
limits?
Lars
On 1 April 2011 11:54, lars va
in this case yes.
however on the jobs running on our farm we put no memory limits as of yet.
just request amount of procs
is the it usual behaviour that if it fails with this code that the
subsequent dependencies start regardless?
Lars
On 1 April 2011 11:41, Reuti wrote:
> Hi,
>
> Am 01.04.2
Hi,
Am 01.04.2011 um 12:33 schrieb lars van der bijl:
> Hey everyone.
>
> Where having some issues with job's being killed with exit status 137.
137 = 128 + 9
$ kill -l
1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL
5) SIGTRAP 6) SIGABRT 7) SIGBUS 8) SIGFPE
9) SIG
Hey everyone.
Where having some issues with job's being killed with exit status 137. This
causes the task to finish and start it dependent task which is causing all
kind of havoc.
submitting a job with a very small max memory limit gives me this this as a
example.
$ qacct -j 21141
==
Ah - that explains it, thanks. I just saw the big comment at the top of
the arch script explaining how changes must be propagated to arch.dist,
and assumed someone forgot!
I guess this could bite a few folk though since it detected 2.6 at
compile time but 2.4 at runtime (this was on CentOS 5.5/x86
23 matches
Mail list logo