Am Montag, 26. April 2010 08:35:53 schrieb Andrew Beekhof: > What versions of pacemaker and the dlm? > What does the stack trace from the core look like?
I reinstalled the packages from http://ppa.launchpad.net/ubuntu-ha/lucid- cluster that's 3.0.7 for the dlm and 1.0.8+hg15494 for pacemaker report: http://users.fbihome.de/~oheinz/ha-cluster/report_1.tar.bz2 core-file: http://users.fbihome.de/~oheinz/ha-cluster/core.2606.bz2 I cc:ed the ubuntu-ha list as it might be packaging related. TIA, Oliver > > On Sun, Apr 25, 2010 at 1:15 PM, Oliver Heinz <[email protected]> wrote: > > Am Samstag, 24. April 2010, um 17:27:42 schrieb Pål Simensen: > >> Can you check your dmesg to see if DLM is segfaulting? I might be > >> experiencing the same problem. If corosync is started at boot DLM > >> segfaults, but if it's started manually everything is ok. Still trying > >> to find out more about what is going on, and I sadly can't provide more > >> information before Monday when I get to work. We did even try bootchart > >> to see if that could provide some more information, but sadly no. We > >> also changed the start order to corosync by renaming the init symlink > >> to S98corosync, but that didn't work out either. > > > > You are right, dlm is segfaulting and network is already up at that time. > > > > [ 15.654093] br53: port 1(vlan53) entering forwarding state > > [ 15.664083] br83: port 1(vlan83) entering forwarding state > > ... > > [ 46.979087] dlm_controld.pc[2533]: segfault at 0 ip 00007f30f7d68022 > > sp 00007fffddf0e288 error 4 in libc-2.11.1.so[7f30f7ce5000+178000] > > > > I rebuild the packages http://ppa.launchpad.net/ubuntu-ha/lucid- > > cluster/ubuntu/pool/main/r/redhat-cluster on a freshly installed lucid VM > > but this didn't change anything. I even upgraded them to current 3.0.11 > > still segfaulting. So try and error seems not to work. Maybe someone > > with a little more understanding what's going on can do an educated > > guess? > > > > TIA, > > Oliver > > > >> On Sat, Apr 24, 2010 at 12:25 PM, Oliver Heinz <[email protected]> wrote: > >> > Hi, > >> > > >> > when rebooting my cluster nodes they won't bring up the ocfs2-fs > >> > because of resDLM failing. When I issue a '/etc/init.d/pacemaker > >> > restart' afterwards everything is fine. > >> > > >> > The machine needs quite a while to bring up the (bonding) network > >> > interfaces. > >> > Do timeout values need to be adjusted? Or should I rather try to > >> > startup pacemaker after the network is completely up? > >> > > >> > > >> > my current config: > >> > > >> > node server-c \ > >> > > >> > attributes standby="off" > >> > > >> > node server-d > >> > primitive failover-ip ocf:heartbeat:IPaddr \ > >> > > >> > params ip="192.168.5.150" \ > >> > op monitor interval="10s" > >> > > >> > primitive resDLM ocf:pacemaker:controld \ > >> > > >> > op monitor interval="120s" > >> > > >> > primitive resFS ocf:heartbeat:Filesystem \ > >> > > >> > params device="/dev/mapper/data-data" directory="/srv/data" > >> > > >> > fstype="ocfs2" \ > >> > > >> > op monitor interval="120s" > >> > > >> > primitive resO2CB ocf:pacemaker:o2cb \ > >> > > >> > op monitor interval="120s" > >> > > >> > clone cloneDLM resDLM \ > >> > > >> > meta globally-unique="false" interleave="true" > >> > > >> > clone cloneFS resFS \ > >> > > >> > meta interleave="true" ordered="true" > >> > > >> > clone cloneO2CB resO2CB \ > >> > > >> > meta globally-unique="false" interleave="true" > >> > > >> > colocation colFSO2CB inf: cloneFS cloneO2CB > >> > colocation colO2CBDLM inf: cloneO2CB cloneDLM > >> > order ordDLMO2CB 0: cloneDLM cloneO2CB > >> > order ordO2CBFS 0: cloneO2CB cloneFS > >> > property $id="cib-bootstrap-options" \ > >> > > >> > dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \ > >> > cluster-infrastructure="openais" \ > >> > expected-quorum-votes="2" \ > >> > stonith-enabled="false" \ > >> > last-lrm-refresh="1272026744" > >> > > >> > I tried something like > >> > primitive resDLM ocf:pacemaker:controld \ > >> > > >> > op start timeout="100s" \ > >> > op monitor interval="120s" > >> > > >> > but this didn't help. > >> > > >> > > >> > > >> > > >> > > >> > TIA, > >> > Oliver > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > Pacemaker mailing list: [email protected] > >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > > >> > Project Home: http://www.clusterlabs.org > >> > Getting started: > >> > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > > > _______________________________________________ > > Pacemaker mailing list: [email protected] > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > _______________________________________________ > Pacemaker mailing list: [email protected] > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf _______________________________________________ Mailing list: https://launchpad.net/~ubuntu-ha Post to : [email protected] Unsubscribe : https://launchpad.net/~ubuntu-ha More help : https://help.launchpad.net/ListHelp

