The config file I sent you may be wrong. Since all nodes are virtual machines, they may have been re-deployed before I got the config file. But I'm sure I checked the config files were consistent and downloaded the logs before re-deploy.
>> For now I guess reason can be one ofe: >> - ifdown on one of other nodes which made whole membership broken I checked my colleague's operations, and found this may be right. Unfortunately, our production was cancelled last week, and all environments were destroyed. I have no resources to help you to diagnose the problem any more. And I have to stop the work on pacemaker and corosync. I'm really sorry I have not helped to resolve the corosync segfault. Thanks very much for all the help from you, the maintainers, and the community. At 2017-03-17 22:30:01, "Jan Friesse" <jfrie...@redhat.com> wrote: >> I have checked all the config files are the same, except bindnetaddr. >> So I'm sending only logs. > >I'm not sure if config files matches log files. Because config file >contains nodes 200.201.162.(52|53|54), but log files contains ip >200.201.162.(52|53|55). > >Can you confirm node with ip 200.201.162.54 exists and it shouldn't be >200.201.162.55 (or 200.201.162.55 shouldn't have ip 200.201.162.54)? > >Honza > >> >> >> >> >> >> >> 在2017年03月16 15时54分, "Jan Friesse"<jfrie...@redhat.com>写道: >> >>> corosync.conf and debug logs are in attachment. >> >> Thanks for them. They look really interesting. As can be seen >> >> Mar 14 11:37:28 [57827] node-132.acloud.vt corosync debug [TOTEM ] >> timer_function_orf_token_timeout The token was lost in the >> OPERATIONAL state. >> >> corosync correctly detected token lost. Also >> >> Mar 14 11:44:41 [57827] node-132.acloud.vt corosync debug [TOTEM ] >> memb_state_gather_enter entering GATHER state from 11(merg >> e during join). >> >> says it correctly detected merge. But since then it's becoming weird. >> Mar 14 11:44:54 [57827] node-132.acloud.vt corosync debug [TOTEM ] >> memb_state_gather_enter entering GATHER state from 0(conse >> nsus timeout). >> Mar 14 11:45:06 [57827] node-132.acloud.vt corosync debug [TOTEM ] >> memb_state_gather_enter entering GATHER state from 0(conse >> nsus timeout). >> ... >> Mar 14 12:55:47 [154709] node-132.acloud.vt corosync debug [TOTEM ] >> memb_state_gather_enter entering GATHER state from 0(cons >> ensus timeout) >> >> So even after two other nodes merged, there is still something what >> prevents corosync to reach consensus. >> >> Would it be possible to attach also other nodes logs/configs? >> >> For now I guess reason can be one ofe: >> - ifdown on one of other nodes which made whole membership broken >> - different node list in config between nodes >> - "forget" node with node list containing one of the 200.201.162.x nodes >> >> Regards, >> Honza >>> >>> And two messages from kernel: >>> >>> 2017-03-14 11:37:20.097233 - info e1000: eth0 NIC Link is Down >>> >>> 2017-03-14 11:44:41.032121 - info e1000: eth0 NIC Link is Up 1000 Mbps >>> Full Duplex, Flow Control: RX >>> >>> >>> Thanks. >>> >>> >>> On 2017/3/15 16:29, Jan Friesse wrote: >>>>> Yesterday I found corosync took almost one hour to form a cluster(a >>>>> failed node came back online). >>>> >>>> This for sure shouldn't happen (at least with default timeout settings). >>>> >>>>> >>>>> So I captured some corosync packets, and opened the pcap file in >>>>> wireshark. >>>>> >>>>> But wireshark only displayed raw udp, no totem. >>>>> >>>>> Wireshark version is 2.2.5. I'm sure it supports corosync totem. >>>>> >>>>> corosync is 2.4.0. >>>> >>>> Wireshark has corosync dissector, but only for version 1.x. 2.x is not >>>> supported yet. >>>> >>>>> >>>>> And if corosync takes too long to form a cluster, how to diagnose it? >>>>> >>>>> I read the logs, but could not figure it out. >>>> >>>> Logs, specially when debug is enabled, has usually enough info. Can >>>> paste your config + logs? >>>> >>>> Regards, >>>> Honza >>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> http://lists.clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list: Users@clusterlabs.org >>>> http://lists.clusterlabs.org/mailman/listinfo/users >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://lists.clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://lists.clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > >_______________________________________________ >Users mailing list: Users@clusterlabs.org >http://lists.clusterlabs.org/mailman/listinfo/users > >Project Home: http://www.clusterlabs.org >Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org