Re: [Linux-HA] Linux-HA Service Monitoring

2008-01-02 Thread Thomas Glanzmann
Hallo, > We in the implementation of heartbeat with squid. we done the script for > starting squid during heartbeat takeover. But when the service (squid in > this case) is stopped, the heartbeat will not takeover. I saw that there > are some resource monitoring solutions are available, but i d

[Linux-HA] Linux-HA Service Monitoring

2008-01-02 Thread Jayaprakash
Hi all, We in the implementation of heartbeat with squid. we done the script for starting squid during heartbeat takeover. But when the service (squid in this case) is stopped, the heartbeat will not takeover. I saw that there are some resource monitoring solutions are available, but i don't k

[Linux-HA] Jan 2 23:25:01 postgres-02 tengine: [8736]: ERROR: te_graph_trigger: Transition failed: terminated

2008-01-02 Thread Thomas Glanzmann
Hello, what does that line mean? Jan 2 23:25:01 postgres-02 tengine: [8736]: ERROR: te_graph_trigger: Transition failed: terminated Jan 2 23:25:01 postgres-02 tengine: [8736]: info: process_te_message: Processing (N)ACK lrm_invoke-lrmd-1199312701-4 from postgres-01 Jan 2 23:25:01 postgres-02

Re: [Linux-HA] daemon timeout trying to use ocf to startup

2008-01-02 Thread Serge Dubrouski
Still your process doesn't start up in 20 sec. Have you tried to run your OCF RA manually using "server start"? Does it start all right? On Jan 2, 2008 3:18 PM, <[EMAIL PROTECTED]> wrote: > Serge, thanks for the quick response (and missing flame :)). I've added > to the server primitive: > >

RE: [Linux-HA] daemon timeout trying to use ocf to startup

2008-01-02 Thread Blatt_Lew
Serge, thanks for the quick response (and missing flame :)). I've added to the server primitive: but still gets timeout. At the risk of exposing my stupidity, here are more details: I've added to the server script some ocf_log calls, as in th

Re: [Linux-HA] Broadcast Heartbeat gets lost if a interface gets temporary unavailable

2008-01-02 Thread Thomas Glanzmann
Hello Dejan, > http://developerbugs.linux-foundation.org/show_bug.cgi?id=1732 Alan Robertson wrote the following: > I think this is now fixed. > If we have repeated EBADF or ENODEV errors, then the write process will exit > If a write or read process exits, the device is reopened and the proce

Re: [Linux-HA] daemon timeout trying to use ocf to startup

2008-01-02 Thread Serge Dubrouski
Looks like your OCF "server" script wasn't able to start server in a given time. On Jan 2, 2008 1:47 PM, <[EMAIL PROTECTED]> wrote: > Well you all seem like a friendly enough bunch as I lurk about the list, > so here goes... > I've read some fine Linux-HA (V2) tutorials and have begun experimenti

[Linux-HA] daemon timeout trying to use ocf to startup

2008-01-02 Thread Blatt_Lew
Well you all seem like a friendly enough bunch as I lurk about the list, so here goes... I've read some fine Linux-HA (V2) tutorials and have begun experimenting with Linux-HA on a 2 node setup. Installation of heartbeat went well and I even glimpsed ip failover in action. Now I am attempting to

[Linux-HA] crm nagios plugin; tomcat ocf agent on top of debian etch init script

2008-01-02 Thread Thomas Glanzmann
Hello, I just hacked up a crm nagios plugin which works for me. It does not check "crm_verify -LV" but I am going to add that. I don't like it very much but it does a good job for me. Is there a way to get the informations I currently check out of "cibadmin -o status -Q" or something like that in a

Re: [Linux-HA] Broadcast Heartbeat gets lost if a interface gets temporary unavailable

2008-01-02 Thread Thomas Glanzmann
Hello Dejan, > This should've been fixed recently and included in 2.1.3: > http://developerbugs.linux-foundation.org/show_bug.cgi?id=1732 > Which release do you run? I downloaded the 2.1.3 tarball and ran "fakeroot debian/rules binary" which results in: -rw-r--r-- 1 sithglan icipguru 17066

Re: [Linux-HA] STONITH keeps rebooting node over and over

2008-01-02 Thread Dejan Muhamedagic
Hi, On Mon, Dec 31, 2007 at 10:07:00AM -0500, David S. Madole wrote: > I have setup STONITH on my two-node cluster using a Baytech > RPC-3 as the underlying hardware. > > It works in that if node B fails, then node A performs a power > cycle on it. However, it continues to power-cycle the node >

Re: [Linux-HA] Broadcast Heartbeat gets lost if a interface gets temporary unavailable

2008-01-02 Thread Dejan Muhamedagic
Hi, On Wed, Jan 02, 2008 at 10:25:04AM +0100, Thomas Glanzmann wrote: > Hello, > I have a crosslink cable between two nodes (it's eth1 on both nodes). > When I type in > > ifdown eth1 > ifup eth1 > > I don't see the heartbeat link via eth1 on the other node. The only > thing that

Re: [Linux-HA] Shared configration files using DRBD

2008-01-02 Thread Dejan Muhamedagic
Hi, On Wed, Jan 02, 2008 at 07:21:56PM +0200, Chris Picton wrote: > Thomas Glanzmann wrote: >> Hello Chris, >>> Are there any problems from treating all services like this - >>> (postfix/dovecot/ftp/etc), as long as I dont try share data which is >>> specific to each machine? >> I think it is legi

Re: [Linux-HA] Shared configration files using DRBD

2008-01-02 Thread Chris Picton
Rois Cannon wrote: I don't know about all services but that's how I'm handling Samba. I'm sure each of them has a tweak to making them work they way you want. I haven't put mine in production yet (and someone may say I'm missing something) but I made sure Samba was binding to the VIP and only a

Re: [Linux-HA] Shared configration files using DRBD

2008-01-02 Thread Chris Picton
Thomas Glanzmann wrote: Hello Chris, Are there any problems from treating all services like this - (postfix/dovecot/ftp/etc), as long as I dont try share data which is specific to each machine? I think it is legitimate to share the configuration files on a shared disk if you need the shared d

Re: [Linux-HA] Shared configration files using DRBD

2008-01-02 Thread Thomas Glanzmann
Hello Chris, > Are there any problems from treating all services like this - > (postfix/dovecot/ftp/etc), as long as I dont try share data which is > specific to each machine? I think it is legitimate to share the configuration files on a shared disk if you need the shared disk for _that_ service

Re: [Linux-HA] Shared configration files using DRBD

2008-01-02 Thread Rois Cannon
I don't know about all services but that's how I'm handling Samba. I'm sure each of them has a tweak to making them work they way you want. I haven't put mine in production yet (and someone may say I'm missing something) but I made sure Samba was binding to the VIP and only advertising the VIP na

[Linux-HA] Shared configration files using DRBD

2008-01-02 Thread Chris Picton
Hi I have been investigating heartbeat + drbd for high availability servers, and have the following question: Are there any potential downfalls from sharing service configuration files between machines, by placing the configs on the shared drbd disk? My config has /srv as the drbd (active-passiv

[Linux-HA] Question: How to configure crm for correct use of drbd ocf RA

2008-01-02 Thread Lukáš Pecha
Hello, I am trying to setup correctly the drbd ocf resource agent, but it still doesn't do the things I want. I was following the setups described at http://linux-ha.org/v2/Concepts/MultiState and at http://wiki.linux-ha.org/DRBD/HowTov2. I have two nodes and I am using drbd to replicate disks of x

Re: [Linux-HA] Nagios Monitoring

2008-01-02 Thread Mark Eisenblaetter
Hi Thomas, I'm using HA1, so i'm not sure waht you meen. But i monitor my resources with the normla Plugins and use check_cluster to check if it minimum 1 is ok. Mark On Jan 2, 2008 12:13 PM, Thomas Glanzmann <[EMAIL PROTECTED]> wrote: > Hello Mark, > > > in addition there are to Checks on nag

Re: [Linux-HA] Nagios Monitoring

2008-01-02 Thread Thomas Glanzmann
Hello Mark, > in addition there are to Checks on nagios-exchange.org, > check_drbd and check_heartbeat_link thanks I installed both and I am very happy with the two. Now, the only thing that is missing is a nagios plugin that supervices my crm resources. Thomas _

Re: [Linux-HA] Nagios Monitoring

2008-01-02 Thread Mark Eisenblaetter
Hi, in addition there are to Checks on nagios-exchange.org, check_drbd and check_heartbeat_link I use both to test my systems. Mark On Jan 1, 2008 9:34 PM, Thomas Glanzmann <[EMAIL PROTECTED]> wrote: > Hello, > I would like to monitor my linux-ha installation using nagios and I > wonder if so

[Linux-HA] coding bugfix for lib/plugins/stonith/ipmilan.c

2008-01-02 Thread Chun Tian (binghe)
Hi, Linux-HA Developers I found a little coding bug in lib/plugins/stonith/ipmilan.c which cause compiling failed when --enable-ipmilan, as in attach. diff -r 7cea5a8c5c0e lib/plugins/stonith/ipmilan.c --- a/lib/plugins/stonith/ipmilan.c Fri Dec 21 23:13:06 2007 -0700 +++ b/lib/plugins/stoni

Re: [Linux-HA] Nagios Monitoring

2008-01-02 Thread Peter Clapham
On Tue, 1 Jan 2008, Thomas Glanzmann wrote: Hello, I would like to monitor my linux-ha installation using nagios and I wonder if someone has done work on it. Because I would like to use that. However I am also perfectly capable of writing my own module. But before I would like to know what I c

Re: [Linux-HA] DBRD - split brain - and HA is happily migrating

2008-01-02 Thread Thomas Glanzmann
Hello Dominik, > drbdadm -- --overwrite-data-of-peer primary all > drbdadm -- --discard-my-data connect all thank you a lot for the two commands. I wasn't aware of the first one and wrote it down in my ha/drbd cheat sheet. But I monitor drbd using nagios. So as soon as a node gets "Outdated" I re

[Linux-HA] Broadcast Heartbeat gets lost if a interface gets temporary unavailable

2008-01-02 Thread Thomas Glanzmann
Hello, I have a crosslink cable between two nodes (it's eth1 on both nodes). When I type in ifdown eth1 ifup eth1 I don't see the heartbeat link via eth1 on the other node. The only thing that helps is if I call /etc/init.d/heartbeat restart on the node that lost "eth1"

Re: [Linux-HA] DBRD - split brain - and HA is happily migrating

2008-01-02 Thread Dominik Klein
Thanks for your help. It looks like everything works as desired: (postgres-02) [~] ifconfig eth1 down (postgres-02) [~] cat /proc/drbd version: 8.2.1 (api:86/proto:86-87) GIT-hash: 318925802fc2638479ad090b73d7af45503dd184 build by [EMAIL PROTECTED], 2007-12-29 17:37:25 0: cs:WFConnection st:Sec

Re: [Linux-HA] DBRD - split brain - and HA is happily migrating

2008-01-02 Thread Thomas Glanzmann
Hello Dominik, > Thanks. I adopted my configuration and test if it works as desired. Thanks for your help. It looks like everything works as desired: (postgres-02) [~] ifconfig eth1 down (postgres-02) [~] cat /proc/drbd version: 8.2.1 (api:86/proto:86-87) GIT-hash: 318925802fc2638479ad090b73d7af

Re: [Linux-HA] DBRD - split brain - and HA is happily migrating

2008-01-02 Thread Thomas Glanzmann
Hello Dominik, > In DRBD or in the entire cluster? just the DRBD. And I think that I produced the situation exactly as you said using (drbdadm primary). > You didnt give your drbd.conf, but I suppose you do not use DRBD > resource fencing. Without resource fencing, it is perfectly possible > to