On Fri, Dec 1, 2017 at 4:32 AM, Hui Xiang <xiangh...@gmail.com> wrote:
> Thanks Numan, in my environment, it's worse, it's even not getting started > and the monitor is only called once other than repeatedly for both > master/slave or none, do you know if any problem could cause pacemaker have > this decision? other resource are good. > Hi Hui, Can you share me the output of the commands - pcs resource show <OVN_DB_RES_NAME> and the commands you have used to create pacemaker ovn resources. In your previous output of pcs resource show, the meta attribute notify was not set properly. Thanks Numan > > On Fri, Dec 1, 2017 at 2:08 AM, Numan Siddique <nusid...@redhat.com> > wrote: > >> Hi HuiXiang, >> Even I am seeing the issue where no node is promoted as master. I will >> test more, fix and and submit patch set v3. >> >> Thanks >> Numan >> >> >> On Thu, Nov 30, 2017 at 4:10 PM, Numan Siddique <nusid...@redhat.com> >> wrote: >> >>> >>> >>> On Thu, Nov 30, 2017 at 1:15 PM, Hui Xiang <xiangh...@gmail.com> wrote: >>> >>>> Hi Numan, >>>> >>>> Thanks for helping, I am following your pcs example, but still with no >>>> lucky, >>>> >>>> 1. Before running any configuration, I stopped all of the ovsdb-server >>>> for OVN, and ovn-northd. Deleted ovnnb_active.conf/ovnsb_active.conf. >>>> >>>> 2. Since I have already had an vip in the cluster, so I chose to use >>>> it, it's status is OK. >>>> [root@node-1 ~]# pcs resource show >>>> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >>>> >>>> 3. Use pcs to create ovndb-servers and constraint >>>> [root@node-1 ~]# pcs resource create tst-ovndb ocf:ovn:ovndb-servers >>>> manage_northd=yes master_ip=192.168.0.2 nb_master_port=6641 >>>> sb_master_port=6642 master >>>> ([root@node-1 ~]# pcs resource meta tst-ovndb-master notify=true >>>> Error: unable to find a resource/clone/master/group: >>>> tst-ovndb-master) ## returned error, so I changed into below command. >>>> >>> >>> Hi HuiXiang, >>> This command is very important. Without which, pacemaker do not notify >>> the status change and ovsdb-servers would not be promoted or demoted. >>> Hence you don't see the notify action getting called in ovn ocf script. >>> >>> Can you try with the other command which I shared in my previous email. >>> These commands work fine for me. >>> >>> Let me know how it goes. >>> >>> Thanks >>> Numan >>> >>> >>> [root@node-1 ~]# pcs resource master tst-ovndb-master tst-ovndb >>>> notify=true >>>> [root@node-1 ~]# pcs constraint colocation add master tst-ovndb-master >>>> with vip__management_old >>>> >>>> 4. pcs status >>>> [root@node-1 ~]# pcs status >>>> vip__management_old (ocf::es:ns_IPaddr2): Started node-1.domain.tld >>>> Master/Slave Set: tst-ovndb-master [tst-ovndb] >>>> Stopped: [ node-1.domain.tld node-2.domain.tld node-3.domain.tld ] >>>> >>>> 5. pcs resource show XXX >>>> [root@node-1 ~]# pcs resource show vip__management_old >>>> Resource: vip__management_old (class=ocf provider=es type=ns_IPaddr2) >>>> Attributes: nic=br-mgmt base_veth=br-mgmt-hapr ns_veth=hapr-m >>>> ip=192.168.0.2 iflabel=ka cidr_netmask=24 ns=haproxy gateway=none >>>> gateway_metric=0 iptables_start_rules=false iptables_stop_rules=false >>>> iptables_comment=default-comment >>>> Meta Attrs: migration-threshold=3 failure-timeout=60 >>>> resource-stickiness=1 >>>> Operations: monitor interval=3 timeout=30 >>>> (vip__management_old-monitor-3) >>>> start interval=0 timeout=30 (vip__management_old-start-0) >>>> stop interval=0 timeout=30 (vip__management_old-stop-0) >>>> [root@node-1 ~]# pcs resource show tst-ovndb-master >>>> Master: tst-ovndb-master >>>> Meta Attrs: notify=true >>>> Resource: tst-ovndb (class=ocf provider=ovn type=ovndb-servers) >>>> Attributes: manage_northd=yes master_ip=192.168.0.2 >>>> nb_master_port=6641 sb_master_port=6642 >>>> Operations: start interval=0s timeout=30s >>>> (tst-ovndb-start-timeout-30s) >>>> stop interval=0s timeout=20s (tst-ovndb-stop-timeout-20s) >>>> promote interval=0s timeout=50s >>>> (tst-ovndb-promote-timeout-50s) >>>> demote interval=0s timeout=50s >>>> (tst-ovndb-demote-timeout-50s) >>>> monitor interval=30s timeout=20s >>>> (tst-ovndb-monitor-interval-30s) >>>> monitor interval=10s role=Master timeout=20s >>>> (tst-ovndb-monitor-interval-10s-role-Master) >>>> monitor interval=30s role=Slave timeout=20s >>>> (tst-ovndb-monitor-interval-30s-role-Slave) >>>> >>>> >>>> 6. I have put log in every ovndb-servers op, seems only the monitor op >>>> is being called, no promoted by the pacemaker DC: >>>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>>> ovsdb_server_monitor >>>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>>> ovsdb_server_check_status >>>> <30>Nov 30 15:22:19 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>>> return OCFOCF_NOT_RUNNINGG >>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>>> ovsdb_server_master_update: 7} >>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>>> ovsdb_server_master_update end} >>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(tst-ovndb)[2980860]: INFO: >>>> monitor is going to return 7 >>>> <30>Nov 30 15:22:20 node-1 ovndb-servers(undef)[2980970]: INFO: >>>> metadata exit OCF_SUCCESS} >>>> >>>> >>>> Please take a look, thank you very much. >>>> Hui. >>>> >>>> >>>> >>>> >>>> On Wed, Nov 29, 2017 at 11:03 PM, Numan Siddique <nusid...@redhat.com> >>>> wrote: >>>> >>>>> >>>>> >>>>> On Wed, Nov 29, 2017 at 4:16 PM, Hui Xiang <xiangh...@gmail.com> >>>>> wrote: >>>>> >>>>>> FYI, If I have configured a good ovndb-server cluster with one active >>>>>> two slaves, then start pacemaker ovn-servers resource agents, they are >>>>>> all >>>>>> becoming slaves... >>>>>> >>>>> >>>>> You don't need to start ovndb-servers. When you create pacemaker >>>>> resources it would automatically start them and promote on of them. >>>>> >>>>> One thing which is very important is to create an IPaddr2 resource >>>>> before and add a colocation constraint so that pacemaker would promote the >>>>> ovsdb-server in the node >>>>> where IPaddr2 resource is running. This IPaddr2 resource ip should be >>>>> your master ip. >>>>> >>>>> Can you please do "pcs resource show <name_of_the_resource>" and share >>>>> the output ? >>>>> >>>>> Below is how I normally use for my testing. >>>>> >>>>> ############ >>>>> pcs cluster cib tmp-cib.xml >>>>> cp tmp-cib.xml tmp-cib.xml.deltasrc >>>>> >>>>> pcs -f tmp-cib.xml resource create tst-ovndb ocf:ovn:ovndb-servers >>>>> manage_northd=yes master_ip=192.168.24.10 nb_master_port=6641 >>>>> sb_master_port=6642 master >>>>> pcs -f tmp-cib.xml resource meta tst-ovndb-master notify=true >>>>> pcs -f tmp-cib.xml constraint colocation add master tst-ovndb-master >>>>> with ip-192.168.24.10 >>>>> >>>>> pcs cluster cib-push tmp-cib.xml diff-against=tmp-cib.xml.deltasrc >>>>> pcs status >>>>> ############## >>>>> >>>>> In the above example, "ip-192.168.24.10" is the IPaddr2 resource. >>>>> >>>>> Thanks >>>>> Numan >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> On Tue, Nov 28, 2017 at 10:48 PM, Numan Siddique <nusid...@redhat.com >>>>>> > wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Nov 28, 2017 at 2:29 PM, Hui Xiang <xiangh...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Numan, >>>>>>>> >>>>>>>> >>>>>>>> Finally figure it out what's wrong when running ovndb-servers ocf >>>>>>>> in my environment. >>>>>>>> >>>>>>>> 1. There is no default ovnnb and ovnsb running in my environment, I >>>>>>>> thought it should be started by pacemaker as the usual way other >>>>>>>> typical >>>>>>>> resource agent do it. >>>>>>>> when I create the ovndb_servers resource, nothing happened, no >>>>>>>> operation is executed except monitor, which is really hard to debug >>>>>>>> for a >>>>>>>> while. >>>>>>>> In the ovsdb_server_monitor() function, first it will check the >>>>>>>> status, here, it will be return NOT_RUNNING, then in >>>>>>>> the ovsdb_server_master_update() function, "CRM_MASTER -D" is >>>>>>>> being executed, which appears stopped every following action, I am not >>>>>>>> very >>>>>>>> clear what work it did. >>>>>>>> >>>>>>>> So, do the ovn_nb and ovn_sb needs to be running previouly before >>>>>>>> pacemaker ovndb_servers resource create? Is there any such >>>>>>>> documentation >>>>>>>> referred? >>>>>>>> >>>>>>> No they don't need to be. >>>>> >>>>> >>>>>> >>>>>>>> 2. Without your patch every nodes executing ovsdb_server_monitor >>>>>>>> and return OCF_SUCCESS >>>>>>>> However, the first node of the three nodes cluster is executed >>>>>>>> ovsdb_server_stop action, the reason showed below: >>>>>>>> <27>Nov 28 15:35:11 node-1 pengine[1897010]: error: clone_color: >>>>>>>> ovndb_servers:0 is running on node-1.domain.tld which isn't allowed >>>>>>>> Did I miss anything? I don't understand why it isn't allowed. >>>>>>>> >>>>>>>> 3. Regard your patch[1] >>>>>>>> It first reports "/usr/lib/ocf/resource.d/ovn/ovndb-servers: line >>>>>>>> 26: ocf_attribute_target: command not found ]" in my >>>>>>>> environment(pacemaker >>>>>>>> 1.1.12) >>>>>>>> >>>>>>> >>>>>>> Thanks. I will come back to you on your other points. The function >>>>>>> "ocf_attribute_target" action must be added in 1.1.16-12. >>>>>>> >>>>>>> I think it makes sense to either remove "ocf_attribute_target" or >>>>>>> find a way so that even older versions work. >>>>>>> >>>>>>> I will spin a v2. >>>>>>> Thanks >>>>>>> Numan >>>>>>> >>>>>>> >>>>>>> >>>>>>> The log showed same as item2, but I have seen very shortly different >>>>>>>> state from "pcs status" as below shown: >>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>> Slaves: [ node-1.domain.tld node-2.domain.tld >>>>>>>> node-3.domain.tld ] >>>>>>>> There is no promote action being executed. >>>>>>>> >>>>>>>> >>>>>>>> Thanks for looking and help. >>>>>>>> >>>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Nov 24, 2017 at 10:54 PM, Numan Siddique < >>>>>>>> nusid...@redhat.com> wrote: >>>>>>>> >>>>>>>>> Hi Hui Xiang, >>>>>>>>> >>>>>>>>> Can you please try with this patch [1] and see if it works for >>>>>>>>> you ? Please let me know how it goes. But I am not sure, if the patch >>>>>>>>> would >>>>>>>>> fix the issue. >>>>>>>>> >>>>>>>>> To brief, the OVN OCF script doesn't add monitor action for >>>>>>>>> "Master" role. So pacemaker Resource agent would not check for the >>>>>>>>> status >>>>>>>>> of ovn db servers periodically. In case ovn db servers are killed, >>>>>>>>> pacemaker wont know about it. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> You can also take a look at this [1] to know how it is used in >>>>>>>>> openstack with tripleo installation. >>>>>>>>> >>>>>>>>> [1] - https://patchwork.ozlabs.org/patch/839022/ >>>>>>>>> [2] - https://github.com/openstack/puppet-tripleo/blob/master/ma >>>>>>>>> nifests/profile/pacemaker/ovn_northd.pp >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> Numan >>>>>>>>> >>>>>>>>> On Fri, Nov 24, 2017 at 3:00 PM, Hui Xiang <xiangh...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi folks, >>>>>>>>>> >>>>>>>>>> I am following what suggested on doc[1] to configure the >>>>>>>>>> ovndb_servers HA, however, it's so unluck with upgrading pacemaker >>>>>>>>>> packages >>>>>>>>>> from 1.12 to 1.16, do almost every kind of changes, there still not a >>>>>>>>>> ovndb_servers master promoted, is there any special recipe for it to >>>>>>>>>> run? >>>>>>>>>> so frustrated on it, sigh. >>>>>>>>>> >>>>>>>>>> It always showed: >>>>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>>>> Stopped: [ node-1.domain.tld node-2.domain.tld >>>>>>>>>> node-3.domain.tld ] >>>>>>>>>> >>>>>>>>>> Even if I tried below steps: >>>>>>>>>> 1. pcs resource debug-stop ovndb_server on every nodes. >>>>>>>>>> ovn-ctl status_ovnxb: running/backup >>>>>>>>>> 2. pcs resource debug-start ovndb_server on every nodes. >>>>>>>>>> ovn-ctl status_ovnxb: running/backup >>>>>>>>>> 3. pcs resource debug-promote ovndb_server on one nodes. >>>>>>>>>> ovn-ctl status_ovnxb: running/active >>>>>>>>>> >>>>>>>>>> With above status, the pcs status still showed as: >>>>>>>>>> Master/Slave Set: ovndb_servers-master [ovndb_servers] >>>>>>>>>> Stopped: [ node-1.domain.tld node-2.domain.tld >>>>>>>>>> node-3.domain.tld ] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1]. https://github.com/openvswitch/ovs/blob/master/Document >>>>>>>>>> ation/topics/integration.rst >>>>>>>>>> >>>>>>>>>> Appreciated any hint. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> discuss mailing list >>>>>>>>>> disc...@openvswitch.org >>>>>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss