Re: [Linux-HA] Question about ipfail problem with heartbeat 1.2.x

2007-04-10 Thread Hideaki Kondo
Thanks a lot for your quick reply. > It's definitely solved in the 2.0.x series, which also runs 1.x style > configurations. I've understood. It seems that i had better use the 2.0.x series with 1.x style configurations in order to solve the problem about ipfail. On Tue, 10 Apr 2007 20:08:37 -

Re: [Linux-HA] Question about ipfail problem with heartbeat 1.2.x

2007-04-10 Thread Alan Robertson
Hideaki Kondo wrote: > I have one question about heartbeat1.2.x. > # I'm sorry for my question about heartbeat 1.2.x > # while recent topic is heartbeat 2.0.x . > > I heard that there's the problem about ipfail and bcast > with heartbeat1.2.x as written the following URL. > > http://www.gossamer-

[Linux-HA] Question about ipfail problem with heartbeat 1.2.x

2007-04-10 Thread Hideaki Kondo
I have one question about heartbeat1.2.x. # I'm sorry for my question about heartbeat 1.2.x # while recent topic is heartbeat 2.0.x . I heard that there's the problem about ipfail and bcast with heartbeat1.2.x as written the following URL. http://www.gossamer-threads.com/lists/linuxha/users/2313

[Linux-HA] Interest in Linux-HA at LinuxWorld San Francisco?

2007-04-10 Thread Alan Robertson
Hi, I'll be speaking at LinuxWorld in San Francisco August 6-9 this year. So, I'll be there. Are others from the list coming? I'll be giving a tutorial at LinuxWorld San Francisco, and I got a note which offers two things: 1) Birds of a Feather session -- Is there interest in this? 2) A .org

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Alan Robertson
Hi Bernd, Thanks for your continuing vigilance! Bernd Schubert wrote: > On Thursday 05 April 2007 20:11:51 Alan Robertson wrote: >> This particular document had a couple of other errors too, which I >> believe I've corrected. See what you think. > > > Thanks for improving the documentation, bu

Re: [Linux-HA] bringing up IP address outside default netblock

2007-04-10 Thread Alan Robertson
Dale Yamamoto wrote: > Running 2.0.5 on Debian Sarge, having an issue bringing up an IP > address. > > These servers have plenty of IP addresses controlled by heartbeat > where those IPs are in the same netblock as the server's own IP > address. Our ISP has allocated us a second netblock that's n

Re: [Linux-HA] heartbeat does not start when the stonith device is not available

2007-04-10 Thread Alan Robertson
Martin wrote: > Hello ! > > Today I have noticed that the heartbeat startup script does not start > when the APC PDU (my stonith device) is configured in ha.cf but not > available. IMHO it creates single point of failure. All the services that > should be highly available are blocked by a simple

[Linux-HA] heartbeat does not start when the stonith device is not available

2007-04-10 Thread Martin
Hello ! Today I have noticed that the heartbeat startup script does not start when the APC PDU (my stonith device) is configured in ha.cf but not available. IMHO it creates single point of failure. All the services that should be highly available are blocked by a simple problem with a non-essent

[Linux-HA] Attend Gelato ICE, April 16-18

2007-04-10 Thread Nan Holda
Gelato ICE: Itanium Conference & Exhibition April 16-18, 2007 | Doubletree Hotel | San Jose, California Dear Linux High-Availability, The Gelato Federation is proud to announce the technical program for the Gelato ICE: Itanium(r) Conference & Expo. International Itanium architecture experts are

[Linux-HA] bringing up IP address outside default netblock

2007-04-10 Thread Dale Yamamoto
Running 2.0.5 on Debian Sarge, having an issue bringing up an IP address. These servers have plenty of IP addresses controlled by heartbeat where those IPs are in the same netblock as the server's own IP address. Our ISP has allocated us a second netblock that's not contiguous with the first one

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Andrew Beekhof
On 4/10/07, Bernd Schubert <[EMAIL PROTECTED]> wrote: On Tuesday 10 April 2007 15:15:09 Andrew Beekhof wrote: > > > You can use per-operation specific parameters in the CIB as well. You > > > can define a special monitor op with interval="0"; the instance > > > parameters defined there will be pa

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Bernd Schubert
On Tuesday 10 April 2007 15:15:09 Andrew Beekhof wrote: > > > You can use per-operation specific parameters in the CIB as well. You > > > can define a special monitor op with interval="0"; the instance > > > parameters defined there will be passed to the startup probe. > > > > Ok, thats in principl

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Bernd Schubert
On Thursday 05 April 2007 20:11:51 Alan Robertson wrote: > This particular document had a couple of other errors too, which I > believe I've corrected. See what you think. Thanks for improving the documentation, but I think the given xml example does not work as it is. Here's a similar fragment

Re: [Linux-HA] pingd not failing over

2007-04-10 Thread Terry L. Inzauro
Terry L. Inzauro wrote: > Alan Robertson wrote: >> Terry L. Inzauro wrote: >>> Alan Robertson wrote: Terry L. Inzauro wrote: > Alan Robertson wrote: >> Daniel Bray wrote: >>> Hello List, >>> >>> I have been unable to get a 2 node active/passive cluster to >>> auto-failo

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T14:39:56, Michael Schwartzkopff <[EMAIL PROTECTED]> wrote: > > Uhm. Of course. Machine 3 has just died. Of course there's no > > connectivity until it is restarted. > Not really. Not only resource 3 is not available during failover, but > ALSO all other resources! That is the problem

Re: [Linux-HA] Getting the status of the node

2007-04-10 Thread Alan Robertson
Alan Robertson wrote: > Mark Eisenblaetter wrote: >> Hi, >> >> sorry, i don't find that script. >> Only some confusing mails about that script. Did you read the web page? On my machine it's located in /usr/bin/cl_status. Where it is on yours depends on how you have things configured. -- A

Re: [Linux-HA] Getting the status of the node

2007-04-10 Thread Alan Robertson
Mark Eisenblaetter wrote: > Hi, > > sorry, i don't find that script. > Only some confusing mails about that script. > > do you know were i can find that script? It's not a script. What version are you running? -- Alan Robertson <[EMAIL PROTECTED]> "Openness is the foundation and preserv

Re: [Linux-HA] Status (rc.d/status)

2007-04-10 Thread Alan Robertson
Mark Frasa wrote: > Hello, > > For an active/passive configuration enviroment i want to know the > status of hearthbeat on the local machine. That's not what the status scrip does. How about cl_status rscstatus? Do read http://linux-ha.org/cl_status carefully. It's not wonderful, but for an R1

Re: [Linux-HA] Getting the status of the node

2007-04-10 Thread Mark Eisenblaetter
Hi, sorry, i don't find that script. Only some confusing mails about that script. do you know were i can find that script? Thanks Mark On 4/3/07, Alan Robertson <[EMAIL PROTECTED]> wrote: Mark Eisenblaetter wrote: > Hello list, > > i'm searching for a tool/script that tells me if the node is

[Linux-HA] Status (rc.d/status)

2007-04-10 Thread Mark Frasa
Hello, For an active/passive configuration enviroment i want to know the status of hearthbeat on the local machine. I have found a script: /etc/ha.d/rc.d/status But this outputs: /etc/ha.d/rc.d/status: line 3: .: filename argument required .: usage: . filename The problem is line 3 in sourci

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Andrew Beekhof
On 4/10/07, Bernd Schubert <[EMAIL PROTECTED]> wrote: On Tuesday 10 April 2007 14:07:54 Lars Marowsky-Bree wrote: > On 2007-04-10T12:02:30, Peter Kruse <[EMAIL PROTECTED]> wrote: > > >But when you return the proper status - running, failed, not running -, > > >heartbeat should do the "right thing

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Bernd Schubert
On Tuesday 10 April 2007 14:07:54 Lars Marowsky-Bree wrote: > On 2007-04-10T12:02:30, Peter Kruse <[EMAIL PROTECTED]> wrote: > > >But when you return the proper status - running, failed, not running -, > > >heartbeat should do the "right thing" automatically when it finds the > > >resource active p

Re: [Linux-HA] HA problems

2007-04-10 Thread Alan Robertson
Angelo Venera wrote: > Hi at all, > > i'm new about this list and about HA. I'm trying to build a HA Active/Passive > for this service: > > amavisd clamd.amavisd dhcpd dovecot httpd mysqld named postfix smb > spamassassin squid > > On start the heartbeat run this service and became primary. Bu

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Michael Schwartzkopff
Am Dienstag, 10. April 2007 14:14 schrieb Lars Marowsky-Bree: > > Per-node ordering might be not the right way. Think about the following. > > Four clones on four machines. Machine 3 dies and the first one has to > > take over. 1) Resource 1 is shout down > > 2) Resource 1 is started > > 3) Resourc

Re: [Linux-HA] Heartbeat stop hangs

2007-04-10 Thread Alan Robertson
kisalay wrote: > Hi, > > I have a 2 node 2.0.8 Linux HA setup. > I have observed that when stop is issued on my setup, as soon as the start > returns, the stop hangs indefinitely, and the only way to stop heartbeat is > to do killall. > > I dug a little deeper into the problem. > > First, the pr

Re: [Linux-HA] Can a RA know if a clone resource is ordered or interleave is true?

2007-04-10 Thread Alan Robertson
Lars Marowsky-Bree wrote: > On 2007-04-05T08:46:40, Alan Robertson <[EMAIL PROTECTED]> wrote: > >>> My only comment on this is that if having two copies of your resource >>> agent running at once causes serious problems, you need to _strongly_ >>> consider re-writing you agent to have sufficient l

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T11:56:59, Michael Schwartzkopff <[EMAIL PROTECTED]> wrote: > > > At the moment I would like to have that parameter since interop between > > > LVS and CLUSTERIP is not tested at all. After these tests we can drop it. > > One can't simply drop a parameter once introduced. > Why not? M

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T12:02:30, Peter Kruse <[EMAIL PROTECTED]> wrote: > >But when you return the proper status - running, failed, not running -, > >heartbeat should do the "right thing" automatically when it finds the > >resource active prior to heartbeat being (re-)started? > The point is, that we misus

[Linux-HA] HA problems

2007-04-10 Thread Angelo Venera
Hi at all, i'm new about this list and about HA. I'm trying to build a HA Active/Passive for this service: amavisd clamd.amavisd dhcpd dovecot httpd mysqld named postfix smb spamassassin squid On start the heartbeat run this service and became primary. But when i try the command nmap on my IP

[Linux-HA] HA status (active / passive)

2007-04-10 Thread Mark Frasa
Hello, I am playing with HA for some production servers. And i would like to know what the easiest way is to tell wheter the local server is active of passive. I guess a ifconfig check *could* do, but is there a builtin? or something else? Thanks alot, /Mark __

Re: [Linux-HA] Error in compiling LinxuHA.

2007-04-10 Thread Athrun Zara
dear Alan Robertson, Thank you very much for fast the reply and sorry for my late one. After following your suggestion by adding --disable-fatal-warnings , the source can be compiled. FYI : my configure command is : **./env \ CFLAGS="-I/opt/include -Wl,--rpath=/opt/lib" \ LDFLAGS="-L/opt/lib"

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Peter Kruse
Hello Lars, Lars Marowsky-Bree wrote: I'm missing the point here. But when you return the proper status - running, failed, not running -, heartbeat should do the "right thing" automatically when it finds the resource active prior to heartbeat being (re-)started? The point is, that we misuse h

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Michael Schwartzkopff
Am Dienstag, 10. April 2007 11:20 schrieb Lars Marowsky-Bree: > > At the moment I would like to have that parameter since interop between > > LVS and CLUSTERIP is not tested at all. After these tests we can drop it. > > One can't simply drop a parameter once introduced. Why not? My script is still

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T11:08:56, Bernd Schubert <[EMAIL PROTECTED]> wrote: > > Ugh. Even probe shouldn't always return "not running", but the actual > > state. This seems like a weird work-around for an otherwise broken > > monitor action, or am I missing something ...? > > Well, once OCF_RESKEY_interval

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-10T11:12:11, Michael Schwartzkopff <[EMAIL PROTECTED]> wrote: > > The old script tried to auto-detect that it was run as a clone and then > > automatically enabled this, which I think is still preferable. If the > > CRM_meta_clone{,_max} show up in the environment, it should switch into

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Bernd Schubert
On Tuesday 10 April 2007 10:15:03 Lars Marowsky-Bree wrote: > On 2007-04-05T17:46:54, Bernd Schubert <[EMAIL PROTECTED]> wrote: > > Ok, so we need to correct the doku again. > > > > > > Here we add a second monitor action, one that runs once per minute. The > > interval is passed to the ResourceAg

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Michael Schwartzkopff
Am Dienstag, 10. April 2007 10:59 schrieb Lars Marowsky-Bree: > > @@ -143,6 +154,15 @@ > > > > > > > > + > > + > > +Enable load sharing via clusterip target of iptables. Be sure to have > > +iptables with clusterip target compiled in. > > + > > +Enable load sharing > > + > > + > > + > > The old

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Bernd Schubert
On Tuesday 10 April 2007 10:15:03 Lars Marowsky-Bree wrote: > However, it still gets passed in - just as OCF_RESKEY_CRM_meta_interval, > to show the distinction to an instance parameter. > > > # on probe (== exclusive) always report process not running > > ql_log warn "OCF_RESKEY_interval

Re: [Linux-HA] Annouce: IPAddr2 RA v1.30 alpha

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-09T20:03:03, Michael Schwartzkopff <[EMAIL PROTECTED]> wrote: > Kernel crash: See https://bugzilla.novell.com/show_bug.cgi?id=238646 > Oops: I just noticed that you are responsoble for that bug since Jan > 25th 2007. Ah, that crash. No, it's not assigned to me, but I commented on it si

Re: [Linux-HA] OCF_RESKEY_interval

2007-04-10 Thread Lars Marowsky-Bree
On 2007-04-05T17:46:54, Bernd Schubert <[EMAIL PROTECTED]> wrote: > Ok, so we need to correct the doku again. > > > Here we add a second monitor action, one that runs once per minute. The > interval is passed to the ResourceAgent as OCF_RESKEY_interval and is a > period in milliseconds. In the

Re: [Linux-HA] Heartbeat stop hangs

2007-04-10 Thread Andrew Beekhof
On 4/9/07, Kevin Jamieson <[EMAIL PROTECTED]> wrote: kisalay wrote: > I have a 2 node 2.0.8 Linux HA setup. > I have observed that when stop is issued on my setup, as soon as the start > returns, the stop hangs indefinitely, and the only way to stop heartbeat is > to do killall. or wait for th

Re: [Linux-HA] 2.0.7 Failover Behavior Question

2007-04-10 Thread Andrew Beekhof
On 3/29/07, Mohler, Eric (EMOHLER) <[EMAIL PROTECTED]> wrote: Andrew, Thanks for your reply. Please refer to <--'s below. The resulting behavior is that the app only restarts on the same node, never ping-pong. ** i assume "ON" and

Re: [Linux-HA] crm_verfify cib.xml verification error

2007-04-10 Thread Andrew Beekhof
pretty sure i commented on this recently i'll patch it today On Apr 6, 2007, at 2:40 PM, Alan Robertson wrote: kisalay wrote: Hi, I recently migrated from 2.0.7 to 2.0.8. when I run my old ( 2.0.7 ) cib.xml through crm_verify now, I receive following warns / errors: element cib: validity er