Re: [Veritas-ha] Question about HA and disks

Jon E Price/SYS/NYTIMES Mon, 27 Oct 2008 18:21:36 -0700

Hi,

A few questions..


Andrey: Could you post the logs (or even portions of them) which show what
ServerA was doing during the takeover?

Joshua: You're saying that IO Fencing can prevent split brain situations in
which one server is still writing to a filesystem while a 2nd server has
taken over that same service group and begun writing to the same fs, thus
possibly causing corruption?

http://sfdoccentral.symantec.com/sf/5.0/linux/html/vcs_install/ch_vcs_in
stall_iofence.html#190559

Jim: What's the evidence that the server panic'd?
        And is 16 seconds the default for the heartbeat failure?


Jon






                                                                           
             "Jim Senicka"                                                 
             <[EMAIL PROTECTED]                                             
             mantec.com>                                                To 
             Sent by:                  "Andrey Dmitriev"                   
             veritas-ha-bounce         <[EMAIL PROTECTED]>,            
             [EMAIL PROTECTED]         <veritas-ha@mailman.eng.auburn.edu> 
             urn.edu                                                    cc 
                                                                           
                                                                   Subject 
             10/27/2008 07:19          Re: [Veritas-ha] Question about HA  
             PM                        and disks                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           






When a server panics, it stops writing to anything but the dump device.
VCS did exactly as designed. 16 seconds after heartbeat failure it
started takeover. Whatever was damaged on your file system was already
damaged at that point, regardless how long it took to dump core to the
dump device. I would look at the cause of the panic, and it is likely it
was something to do with what garbaged your FS


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrey
Dmitriev
Sent: Monday, October 27, 2008 2:01 PM
To: veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] Question about HA and disks

We had an issue where a serverA failed and serverB took over.
However, serverB took over when serverA was still 'crashing' (it took a
good 10-15mins to crash), and apparently still had a hold of file
systems (system logs confirm that takeover occurred while serverA was
still 'puking').
The file systems on ServerB came up corrupt, and we lost some data b/c
of that.
HA is setup via heartbeats. File system is vxfs, OS is RedHat 4.0.
Is there are any way to avoid that?

Thanks,
Andrey

_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha


_______________________________________________
Veritas-ha maillist  -  Veritas-ha@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha

Re: [Veritas-ha] Question about HA and disks

Reply via email to