On Wed, Sep 5, 2018 at 1:05 PM Miguel Duarte de Mora Barroso < mdbarr...@redhat.com> wrote:
> Hi Gianluca, > > I really don't think it should. > > Hi Miguel, thanks for your feedback. Actually my doubts and my question originate from a particular failure detected. I'm going to explain better the environment. I have two hypervisors hv1 and hv2 with oVirt 4.2.5. They are placed into 2 different racks, rack1 and rack2. I have a virtual cluster used for testing/scalability purposes, composed by 4 pacemaker/corosync nodes with CentOS 7.4 as OS. 2 nodes (cl1 and cl2) of this virtual cluster are VMs running on hv1 and 2 nodes (cl3 and cl4) are VMs unning on hv2. hv1 is in rack1 and hv2 is in rack2 They simulate possible future scenario of a physical stretched cluster with 2 nodes in datacenter1 and 2 nodes in datacenter 2. Due to a network problem rack1 has been isolated for about 1 minute. What I registered in /var/log/messages of cl* has been this one below: . cl1 Aug 31 14:53:33 cl1 corosync[1291]: [TOTEM ] A processor failed, forming new configuration. Aug 31 14:53:36 cl1 corosync[1291]: [TOTEM ] A new membership ( 172.16.1.68:436) was formed. Members left: 4 2 3 Aug 31 14:53:36 cl1 corosync[1291]: [TOTEM ] Failed to receive the leave message. failed: 4 2 3 . cl2 Aug 31 14:53:33 cl2 corosync[32749]: [TOTEM ] A processor failed, forming new configuration. Aug 31 14:53:36 cl2 corosync[32749]: [TOTEM ] A new membership ( 172.16.1.69:436) was formed. Members left: 4 1 3 Aug 31 14:53:36 cl2 corosync[32749]: [TOTEM ] Failed to receive the leave message. failed: 4 1 3 - cl3 Aug 31 14:53:33 cl3 corosync[1282]: [TOTEM ] A processor failed, forming new configuration. Aug 31 14:54:10 cl3 corosync[1282]: [TOTEM ] A new membership ( 172.16.1.63:432) was formed. Members left: 1 2 Aug 31 14:54:10 cl3 corosync[1282]: [TOTEM ] Failed to receive the leave message. failed: 1 2 - cl4 Aug 31 14:53:33 cl4 corosync[1295]: [TOTEM ] A processor failed, forming new configuration. Aug 31 14:54:10 cl4 corosync[1295]: [TOTEM ] A new membership ( 172.16.1.63:432) was formed. Members left: 1 2 Aug 31 14:54:10 cl4 corosync[1295]: [TOTEM ] Failed to receive the leave message. failed: 1 2 The intracluster of this virtual cluster is on OVN and the isolation of rack1 caused the virtual nodes inside hv1 not to be able to see any of the other nodes, including the other VM running inside the same hypervisor So cl1 lost 2, 3 and 4; cl2 lost 1, 3 and 4 While both cl3 and cl4 only lost 1 and 2 I supposed that due to the isolation of hv1 the VMs cl1 and cl2 able to see each other over their OVN based vnic. Just for reference the node hv1 (real hostname ov200) got these messages (storage domains are on iSCSI, so unaccessible during the rack1 isolation): Aug 31 14:53:04 ov200 ovn-controller: ovs|26823|reconnect|ERR|ssl: 10.4.192.49:6642: no response to inactivity probe after 5 seconds, disconnecting Aug 31 14:53:11 ov200 kernel: connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4562746767, last ping 4562751768, now 4562756784 Aug 31 14:53:11 ov200 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4562746768, last ping 4562751770, now 4562756784 Aug 31 14:53:11 ov200 kernel: connection1:0: detected conn error (1022) Aug 31 14:53:11 ov200 kernel: connection2:0: detected conn error (1022) Aug 31 14:53:11 ov200 iscsid: Kernel reported iSCSI connection 1:0 error (1022 - Invalid or unknown error code) state (3) Aug 31 14:53:11 ov200 iscsid: Kernel reported iSCSI connection 2:0 error (1022 - Invalid or unknown error code) state (3) . . . Aug 31 14:54:04 ov200 multipathd: 8:32: reinstated Aug 31 14:54:04 ov200 kernel: device-mapper: multipath: Reinstating path 8:32. Aug 31 14:54:04 ov200 multipathd: 36090a0d88034667163b315f8c906b0ac: remaining active paths: 2 > > Could you provide the output of 'ovs-ofctl dump-flows br-int' *before* > and *after* engine is shutdown ? > > Unfortunately not. If it can help verify current situation with all ok and cl1 and cl2 running on hv1, here it is the output on hv1: https://drive.google.com/file/d/1gLtpkKFCBXV46lXJYsMlbonp853EqLun/view?usp=sharing My question regarding engine originated because as a side effect of rack1 isolation, the engine, that is a VM in another environment and configured as the OVN provider, has been unreachable for about 1 minute during the problem. And I saw the first line of ov200 log above: Aug 31 14:53:04 ov200 ovn-controller: ovs|26823|reconnect|ERR|ssl: 10.4.192.49:6642: no response to inactivity probe after 5 seconds, disconnecting > Also outputs to 'ovs-vsctl show' and 'ovs-ofctl show br-int' . Also > before and after engine-shutdown. > Now where all is ok and cl1 and cl2 running on hv1: # ovs-vsctl show 0c8ccaa3-b215-4860-8102-0ea7a24ebcaf Bridge br-int fail_mode: secure Port "ovn-8eea86-0" Interface "ovn-8eea86-0" type: geneve options: {csum="true", key=flow, remote_ip="10.4.192.48"} Port br-int Interface br-int type: internal Port "vnet3" Interface "vnet3" Port "vnet1" Interface "vnet1" ovs_version: "2.9.0" # ovs-ofctl show br-int OFPT_FEATURES_REPLY (xid=0x2): dpid:0000ce296715474c n_tables:254, n_buffers:0 capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst 1(vnet1): addr:fe:1a:4a:16:01:07 config: 0 state: 0 current: 10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 2(vnet3): addr:fe:1a:4a:16:01:08 config: 0 state: 0 current: 10MB-FD COPPER speed: 10 Mbps now, 0 Mbps max 5(ovn-8eea86-0): addr:92:30:37:41:00:43 config: 0 state: 0 speed: 0 Mbps now, 0 Mbps max LOCAL(br-int): addr:ce:29:67:15:47:4c config: PORT_DOWN state: LINK_DOWN speed: 0 Mbps now, 0 Mbps max OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0 > All of the above on the host where the VMs are running. > > Another question; is the OVN network you created an overlay, or is it > attached to a physical network? > > I think you mean overlay, because switch type of the cluster is "Linux Bridge" > Regards, > Miguel > > Thanks in advance for your time, Gianluca
_______________________________________________ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/7V6F4SZSY4BWHM33CYSB5FLM6DNVPYCV/