Re: [Linux-HA] crm crash on centos 4.5

2008-02-14 Thread Andrew Beekhof
On Wed, Feb 13, 2008 at 8:25 PM, Tao Yu [EMAIL PROTECTED] wrote: Sort of. Until 2.1.3 it was the default behavior. Since 2.1.3, you can also get this behavior by specifying crm respawn instead of crm yes My understanding of the fix for this problem is the crmd will not likely to

Re: [Linux-HA] crm crash on centos 4.5

2008-02-13 Thread Tao Yu
Sort of. Until 2.1.3 it was the default behavior. Since 2.1.3, you can also get this behavior by specifying crm respawn instead of crm yes My understanding of the fix for this problem is the crmd will not likely to exit in such scenario. Am I right? Since 2.1.3, crm yes results in the whole

Re: [Linux-HA] crm crash on centos 4.5

2008-02-12 Thread Andrew Beekhof
On Feb 8, 2008, at 7:36 PM, Tao Yu wrote: Question remained: it is not like to lose the connection between the two nodes in our environment. Is there any other scenarios will cause this? the machine could be really _really_ overloaded, kernel bugs, a careless night cleaner :-) anything

Re: [Linux-HA] crm crash on centos 4.5

2008-02-08 Thread Tao Yu
Thanks! Several more questions: 1. My understanding is, when the crmd gets into this mess, it will be gone and the master heartheart control process will bring up a new instance of crmd and it will continue to work. There will be no affect to the cluster. Is it correct? 2. I tried to create

Re: [Linux-HA] crm crash on centos 4.5

2008-02-08 Thread Tao Yu
2. I tried to create the split-brain on a 2-node cluster and then recovered from it. I did see the crmd killed and started again. I did see the core file generated. But this time, I only saw the back trace of: Core was generated by `/usr/lib64/heartbeat/crmd'. Program terminated

Re: [Linux-HA] crm crash on centos 4.5

2008-02-07 Thread Dejan Muhamedagic
On Thu, Feb 07, 2008 at 10:07:58AM +0100, Andrew Beekhof wrote: On Feb 7, 2008, at 12:57 AM, Lars Marowsky-Bree wrote: On 2008-02-06T15:11:28, Tao Yu [EMAIL PROTECTED] wrote: Running the heartbeat 2.1.2 Core: #0 0x003b9602e21d in raise () from /lib64/tls/libc.so.6 #1

Re: [Linux-HA] crm crash on centos 4.5

2008-02-07 Thread Tao Yu
It will be extremely helpful if we could know the bug number related to this problem. Thanks! On Feb 7, 2008 7:05 AM, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Thu, Feb 07, 2008 at 10:07:58AM +0100, Andrew Beekhof wrote: On Feb 7, 2008, at 12:57 AM, Lars Marowsky-Bree wrote: On

Re: [Linux-HA] crm crash on centos 4.5

2008-02-07 Thread Tao Yu
Thanks for all the information. I do believe we have the debuginfo package installed. Will try to reproduce this and try to get the ha-debug file. Could someone point me to the code section for this problem? I want to dig a little deep into heartbeat implementation. :) Thanks again!

Re: [Linux-HA] crm crash on centos 4.5

2008-02-07 Thread Andrew Beekhof
bug #1546 On Feb 7, 2008, at 10:08 PM, Tao Yu wrote: It will be extremely helpful if we could know the bug number related to this problem. Thanks! On Feb 7, 2008 7:05 AM, Dejan Muhamedagic [EMAIL PROTECTED] wrote: On Thu, Feb 07, 2008 at 10:07:58AM +0100, Andrew Beekhof wrote: On Feb

[Linux-HA] crm crash on centos 4.5

2008-02-06 Thread Tao Yu
Running the heartbeat 2.1.2 Core: #0 0x003b9602e21d in raise () from /lib64/tls/libc.so.6 #1 0x003b9602fa1e in abort () from /lib64/tls/libc.so.6 #2 0x003efe80bc90 in crm_abort () from /usr/lib64/libcrmcommon.so.1 #3 0x003efee01adb in ?? () from /usr/lib64/libccmclient.so.1

Re: [Linux-HA] crm crash on centos 4.5

2008-02-06 Thread Lars Marowsky-Bree
On 2008-02-06T15:11:28, Tao Yu [EMAIL PROTECTED] wrote: Running the heartbeat 2.1.2 Core: #0 0x003b9602e21d in raise () from /lib64/tls/libc.so.6 #1 0x003b9602fa1e in abort () from /lib64/tls/libc.so.6 #2 0x003efe80bc90 in crm_abort () from /usr/lib64/libcrmcommon.so.1 #3