*** This bug is a duplicate of bug 1414559 *** https://bugs.launchpad.net/bugs/1414559
** This bug has been marked a duplicate of bug 1414559 OVS drops RARP packets by QEMU upon live-migration - VM temporarily disconnected -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1511430 Title: live migration does not coordinate VM resume with network readiness Status in OpenStack Compute (nova): Confirmed Bug description: When migrating a VM from one host to another in combination with neutron, VM can resume at destination host while network is not ready (race condition) QEMU has a mechanism to send a few RARPs once migration is done and before resuming. Nova needs to coordinate with Qemu and neutron (nova/neutron notification mechanism) to make sure VM is only resumed at destination host when networking has been properly wired, otherwise the RARPs are lost, and connectivity to the VM is disrupted until the VM sends any broadcast message. log detail (merged from two hosts logs and tcpdumps) migration from host 29 to 30 2015-10-29 10:54:27.592000 [VMLIFE30] 21476 INFO nova.compute.manager [-] [instance: a18a5824-4215-4e24-bcfc-cb9f89f6bcbd] VM Resumed (Lifecycle Event) 2015-10-29 10:54:27.609000 [VMLIFE29] 29022 INFO nova.compute.manager [-] [instance: a18a5824-4215-4e24-bcfc-cb9f89f6bcbd] VM Paused (Lifecycle Event) 2015-10-29 10:54:27.636000 [TAP30] tcpdump DEBUG 10:54:27.632047 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46 2015-10-29 10:54:27.656000 [TAP29] tcpdump DEBUG tcpdump: pcap_loop: The interface went down 2015-10-29 10:54:27.787000 [TAP30] tcpdump DEBUG 10:54:27.783353 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46 2015-10-29 10:54:27.818000 [FDB30] ovs-fdb DEBUG 62 0 fa:16:3e:50:a3:46 0 # switch associated to VLAN 0, should be "1", still not tagged, also not propagated to other hosts because vlan0 is invalid in the OVS implementation 2015-10-29 10:54:28.037000 [TAP30] tcpdump DEBUG 10:54:28.033259 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46 2015-10-29 10:54:28.387000 [TAP30] tcpdump DEBUG 10:54:28.383211 fa:16:3e:50:a3:46 > Broadcast, ethertype Reverse ARP (0x8035), length 60: Reverse Request who-is fa:16:3e:50:a3:46 tell fa:16:3e:50:a3:46, length 46 2015-10-29 10:54:28.969000 [VMLIFE29] 29022 INFO nova.compute.manager [-] [instance: a18a5824-4215-4e24-bcfc-cb9f89f6bcbd] VM Stopped (Lifecycle Event) 2015-10-29 10:54:29.803000 [OVS30] 21310 DEBUG neutron.agent.linux.utils [req-a33468a6-f259-4324-a132-ab0dd025eeec None] Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 'Port', 'qvo2e6d0f35-cb', 'tag=1'] # wiring is now ready, and after this neutron-openvswitch-agent will notify neutron-server which could notify nova about readiness... A reproduction ansible script is provided to show how it happens: https://github.com/mangelajo/oslogmerger/blob/master/contrib/debug- live-migration/debug-live-migration.yaml And complete merged output with oslogmerger can be found here: https://raw.githubusercontent.com/mangelajo/oslogmerger/master/contrib/debug-live-migration/logs/mergedlogs-packets-ovs.log To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1511430/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp