How can I follow the first two solutions? Regards, 2018-08-24 8:21 GMT+02:00 Jan Friesse <jfrie...@redhat.com>:
> I tried to install corosync 3.x and it works pretty well. >> > > Cool > > But when I install pacemaker, it installs previous version of corosync as >> dependency and breaks all the setup. >> Any suggestions? >> > > I can see at least following "solutions": > - make proper Debian package > - install corosync 3 to /usr/local > - (ugly) install packaged corosync and reinstall by corosync 3 from source > > Regards, > Honza > > > >> 2018-08-23 9:32 GMT+02:00 Jan Friesse <jfrie...@redhat.com>: >> >> David, >>> >>> BTW, where I can download Corosync 3.x? >>> >>>> I've only seen Corosync 2.99.3 Alpha4 at http://corosync.github.io/coro >>>> sync/ >>>> >>>> >>> Yes, that's Alpha 4 of Corosync 3. >>> >>> >>> >>> >>> 2018-08-23 9:11 GMT+02:00 David Tolosa <david.tol...@upcnet.es>: >>>> >>>> I'm currently using an Ubuntu 18.04 server configuration with netplan. >>>> >>>>> >>>>> Here you have my current YAML configuration: >>>>> >>>>> # This file describes the network interfaces available on your system >>>>> # For more information, see netplan(5). >>>>> network: >>>>> version: 2 >>>>> renderer: networkd >>>>> ethernets: >>>>> eno1: >>>>> addresses: [192.168.0.1/24] >>>>> enp4s0f0: >>>>> addresses: [192.168.1.1/24] >>>>> enp5s0f0: >>>>> {} >>>>> vlans: >>>>> vlan.XXX: >>>>> id: XXX >>>>> link: enp5s0f0 >>>>> addresses: [ 10.1.128.5/29 ] >>>>> gateway4: 10.1.128.1 >>>>> nameservers: >>>>> addresses: [ 8.8.8.8, 8.8.4.4 ] >>>>> search: [ foo.com, bar.com ] >>>>> vlan.YYY: >>>>> id: YYY >>>>> link: enp5s0f0 >>>>> addresses: [ 10.1.128.5/29 ] >>>>> >>>>> >>>>> So, eno1 and enp4s0f0 are the two ethernet ports connected each other >>>>> with crossover cables to node2. >>>>> enp5s0f0 port is used to connect outside/services using vlans defined >>>>> in >>>>> the same file. >>>>> >>>>> In short, I'm using systemd-networkd default Ubuntu 18 server service >>>>> for >>>>> >>>>> >>>> Ok, so systemd-networkd is really doing ifdown and somebody actually >>> tries >>> fix it and merge into upstream (sadly with not too much luck :( ) >>> >>> https://github.com/systemd/systemd/pull/7403 >>> >>> >>> manage networks. Im not detecting any NetworkManager-config-server >>> >>>> package in my repository neither. >>>>> >>>>> >>>> I'm not sure how it's called in Debian based distributions, but it's >>> just >>> one small file in /etc, so you can extract it from RPM. >>> >>> So the only solution that I have left, I suppose, is to test corosync 3.x >>> >>>> and see if it works better handling RRP. >>>>> >>>>> >>>> You may also reconsider to try ether completely static network >>> configuration or NetworkManager + NetworkManager-config-server. >>> >>> >>> Corosync 3.x with knet will work for sure, but be prepared for quite a >>> long compile path, because you first have to compile knet and then >>> corosync. What may help you a bit is that we have a ubuntu 18.04 in our >>> jenkins, so it should be possible corosync build log >>> https://ci.kronosnet.org/view/corosync/job/corosync-build-al >>> l-voting/lastBuild/corosync-build-all-voting=ubuntu-18-04-lt >>> s-x86-64/consoleText, knet build log https://ci.kronosnet.org/view/ >>> knet/job/knet-build-all-voting/lastBuild/knet-build-all- >>> voting=ubuntu-18-04-lts-x86-64/consoleText). >>> >>> Also please consult http://people.redhat.com/ccaul >>> fie/docs/KnetCorosync.pdf about changes in corosync configuration. >>> >>> Regards, >>> Honza >>> >>> >>> Thank you for your quick response! >>>>> >>>>> 2018-08-23 8:40 GMT+02:00 Jan Friesse <jfrie...@redhat.com>: >>>>> >>>>> David, >>>>> >>>>>> >>>>>> Hello, >>>>>> >>>>>> Im getting crazy about this problem, that I expect to resolve here, >>>>>>> with >>>>>>> your help guys: >>>>>>> >>>>>>> I have 2 nodes with Corosync redundant ring feature. >>>>>>> >>>>>>> Each node has 2 similarly connected/configured NIC's. Both nodes are >>>>>>> connected each other by two crossover cables. >>>>>>> >>>>>>> >>>>>>> I believe this is root of the problem. Are you using NetworkManager? >>>>>> If >>>>>> so, have you installed NetworkManager-config-server? If not, please >>>>>> install >>>>>> it and test again. >>>>>> >>>>>> >>>>>> I configured both nodes with rrp mode passive. Everything is working >>>>>> >>>>>>> well >>>>>>> at this point, but when I shutdown 1 node to test failover, and this >>>>>>> node > returns to be online, corosync is marking the interface as >>>>>>> FAULTY >>>>>>> and rrp >>>>>>> >>>>>>> >>>>>>> I believe it's because with crossover cables configuration when other >>>>>> side is shutdown, NetworkManager detects it and does ifdown of the >>>>>> interface. And corosync is unable to handle ifdown properly. Ifdown is >>>>>> bad >>>>>> with single ring, but it's just killer with RRP (127.0.0.1 poisons >>>>>> every >>>>>> node in the cluster). >>>>>> >>>>>> fails to recover the initial state: >>>>>> >>>>>> >>>>>>> 1. Initial scenario: >>>>>>> >>>>>>> # corosync-cfgtool -s >>>>>>> Printing ring status. >>>>>>> Local node ID 1 >>>>>>> RING ID 0 >>>>>>> id = 192.168.0.1 >>>>>>> status = ring 0 active with no faults >>>>>>> RING ID 1 >>>>>>> id = 192.168.1.1 >>>>>>> status = ring 1 active with no faults >>>>>>> >>>>>>> >>>>>>> 2. When I shutdown the node 2, all continues with no faults. >>>>>>> Sometimes >>>>>>> the >>>>>>> ring ID's are bonding with 127.0.0.1 and then bond back to their >>>>>>> respective >>>>>>> heartbeat IP. >>>>>>> >>>>>>> >>>>>>> Again, result of ifdown. >>>>>> >>>>>> >>>>>> 3. When node 2 is back online: >>>>>> >>>>>>> >>>>>>> # corosync-cfgtool -s >>>>>>> Printing ring status. >>>>>>> Local node ID 1 >>>>>>> RING ID 0 >>>>>>> id = 192.168.0.1 >>>>>>> status = ring 0 active with no faults >>>>>>> RING ID 1 >>>>>>> id = 192.168.1.1 >>>>>>> status = Marking ringid 1 interface 192.168.1.1 FAULTY >>>>>>> >>>>>>> >>>>>>> # service corosync status >>>>>>> ● corosync.service - Corosync Cluster Engine >>>>>>> Loaded: loaded (/lib/systemd/system/corosync.service; enabled; >>>>>>> vendor >>>>>>> preset: enabled) >>>>>>> Active: active (running) since Wed 2018-08-22 14:44:09 CEST; >>>>>>> 1min >>>>>>> 38s ago >>>>>>> Docs: man:corosync >>>>>>> man:corosync.conf >>>>>>> man:corosync_overview >>>>>>> Main PID: 1439 (corosync) >>>>>>> Tasks: 2 (limit: 4915) >>>>>>> CGroup: /system.slice/corosync.service >>>>>>> └─1439 /usr/sbin/corosync -f >>>>>>> >>>>>>> >>>>>>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM >>>>>>> ] >>>>>>> The >>>>>>> network interface [192.168.0.1] is now up. >>>>>>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network >>>>>>> interface >>>>>>> [192.168.0.1] is now up. >>>>>>> Aug 22 14:44:11 node1 corosync[1439]: Aug 22 14:44:11 notice [TOTEM >>>>>>> ] >>>>>>> The >>>>>>> network interface [192.168.1.1] is now up. >>>>>>> Aug 22 14:44:11 node1 corosync[1439]: [TOTEM ] The network >>>>>>> interface >>>>>>> [192.168.1.1] is now up. >>>>>>> Aug 22 14:44:26 node1 corosync[1439]: Aug 22 14:44:26 notice [TOTEM >>>>>>> ] >>>>>>> A >>>>>>> new membership (192.168.0.1:601760) was formed. Members >>>>>>> Aug 22 14:44:26 node1 corosync[1439]: [TOTEM ] A new membership ( >>>>>>> 192.168.0.1:601760) was formed. Members >>>>>>> Aug 22 14:44:32 node1 corosync[1439]: Aug 22 14:44:32 notice [TOTEM >>>>>>> ] >>>>>>> A >>>>>>> new membership (192.168.0.1:601764) was formed. Members joined: 2 >>>>>>> Aug 22 14:44:32 node1 corosync[1439]: [TOTEM ] A new membership ( >>>>>>> 192.168.0.1:601764) was formed. Members joined: 2 >>>>>>> Aug 22 14:44:34 node1 corosync[1439]: Aug 22 14:44:34 error [TOTEM >>>>>>> ] >>>>>>> Marking ringid 1 interface 192.168.1.1 FAULTY >>>>>>> Aug 22 14:44:34 node1 corosync[1439]: [TOTEM ] Marking ringid 1 >>>>>>> interface >>>>>>> 192.168.1.1 FAULTY >>>>>>> >>>>>>> >>>>>>> If I execute corosync-cfgtool, clears the faulty error but after some >>>>>>> seconds return to be FAULTY. >>>>>>> The only thing that it resolves the problem is to restart de service >>>>>>> with >>>>>>> service corosync restart. >>>>>>> >>>>>>> Here you have some of my configuration settings on node 1 (I probed >>>>>>> already >>>>>>> to change rrp_mode): >>>>>>> >>>>>>> *- corosync.conf* >>>>>>> >>>>>>> >>>>>>> totem { >>>>>>> version: 2 >>>>>>> cluster_name: node >>>>>>> token: 5000 >>>>>>> token_retransmits_before_loss_const: 10 >>>>>>> secauth: off >>>>>>> threads: 0 >>>>>>> rrp_mode: passive >>>>>>> nodeid: 1 >>>>>>> interface { >>>>>>> ringnumber: 0 >>>>>>> bindnetaddr: 192.168.0.0 >>>>>>> #mcastaddr: 226.94.1.1 >>>>>>> mcastport: 5405 >>>>>>> broadcast: yes >>>>>>> } >>>>>>> interface { >>>>>>> ringnumber: 1 >>>>>>> bindnetaddr: 192.168.1.0 >>>>>>> #mcastaddr: 226.94.1.2 >>>>>>> mcastport: 5407 >>>>>>> broadcast: yes >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> logging { >>>>>>> fileline: off >>>>>>> to_stderr: yes >>>>>>> to_syslog: yes >>>>>>> to_logfile: yes >>>>>>> logfile: /var/log/corosync/corosync.log >>>>>>> debug: off >>>>>>> timestamp: on >>>>>>> logger_subsys { >>>>>>> subsys: AMF >>>>>>> debug: off >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> amf { >>>>>>> mode: disabled >>>>>>> } >>>>>>> >>>>>>> quorum { >>>>>>> provider: corosync_votequorum >>>>>>> expected_votes: 2 >>>>>>> } >>>>>>> >>>>>>> nodelist { >>>>>>> node { >>>>>>> nodeid: 1 >>>>>>> ring0_addr: 192.168.0.1 >>>>>>> ring1_addr: 192.168.1.1 >>>>>>> } >>>>>>> >>>>>>> node { >>>>>>> nodeid: 2 >>>>>>> ring0_addr: 192.168.0.2 >>>>>>> ring1_addr: 192.168.1.2 >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> aisexec { >>>>>>> user: root >>>>>>> group: root >>>>>>> } >>>>>>> >>>>>>> service { >>>>>>> name: pacemaker >>>>>>> ver: 1 >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> *- /etc/hosts* >>>>>>> >>>>>>> >>>>>>> 127.0.0.1 localhost >>>>>>> 10.4.172.5 node1.upc.edu node1 >>>>>>> 10.4.172.6 node2.upc.edu node2 >>>>>>> >>>>>>> >>>>>>> So machines have 3 NICs? 2 for corosync/cluster traffic and one for >>>>>>> >>>>>> regular traffic/services/outside world? >>>>>> >>>>>> >>>>>> Thank you for you help in advance! >>>>>> >>>>>>> >>>>>>> >>>>>>> To conclude: >>>>>> - If you are using NetworkManager, try to install >>>>>> NetworkManager-config-server, it will probably help >>>>>> - If you are brave enough, try corosync 3.x (current Alpha4 is pretty >>>>>> stable - actually some other projects gain this stability with SP1 :) >>>>>> ) >>>>>> that has no RRP but uses knet for support redundant links (up-to 8 >>>>>> links >>>>>> can be configured) and doesn't have problems with ifdown. >>>>>> >>>>>> Honza >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> https://lists.clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: http://www.clusterlabs.org/doc >>>>>>> /Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> *David Tolosa Martínez* >>>>> Customer Support & Infrastructure >>>>> UPCnet - Edifici Vèrtex >>>>> Plaça d'Eusebi Güell, 6, 08034 Barcelona >>>>> Tel: 934054555 >>>>> >>>>> <https://www.upcnet.es> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >> >> > -- *David Tolosa Martínez* Customer Support & Infrastructure UPCnet - Edifici Vèrtex Plaça d'Eusebi Güell, 6, 08034 Barcelona Tel: 934054555 <https://www.upcnet.es> -- INFORMACIÓ BÀSICA SOBRE PROTECCIÓ DE DADES: Responsable: UPCNET, Serveis d'Accés a Internet de la Universitat Politècnica de Catalunya, SLU | Finalitat: gestionar els contactes i les relacions professionals i comercials amb els nostres clients i proveïdors | Base legal: consentiment, interès legítim i/o relació contractual | Destinataris: no seran comunicades a tercers excepte per obligació legal | Drets: pots exercir els teus drets d’accés, rectificació i supressió, així com els altres drets reconeguts a la normativa vigent, enviant-nos un missatge a priv...@upcnet.es <mailto:priv...@upcnet.es> | Més informació: consulta la nostra política completa de protecció de dades <https://www.upcnet.es/politica-de-privacitat>. AVÍS DE CONFIDENCIALITAT <https://www.upcnet.es/ca/avis-de-confidencialitat>
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org