Hi Andrija, Do you use NIC bonds? I have seen this before when using active-active bonds, and as you say it can be very difficult to troubleshoot and the behaviour makes little sense. What can happen is network traffic is load balanced between the two NICs, however the update frequency of the MAC tables between the two switches don’t keep up with the load balanced traffic. In other words a MAC address which used to transmit on hypervisor eth0 (attached to your first top of rack switch) of a bond has suddenly due to load started transmitting on eth1 (attached to the second of the top of rack switches) of the bond, however the physical switch stack still thinks the MAC address lives on eth0, hence traffic is dropped until next time the switches synch MAC tables.
We used to see this a lot in the past on XenServer – the solution being moving to active-passive bond modes, or go up to LACP/802.3ad if your hardware allows for it. The same principle will however also apply on generic linux bonds. Regards, Dag Sonstebo Cloud Architect ShapeBlue S: +44 20 3603 0540 | dag.sonst...@shapeblue.com | http://www.shapeblue.com <http://www.shapeblue.com/> | Twitter:@ShapeBlue <https://twitter.com/#!/shapeblue> On 09/10/2017, 21:52, "Andrija Panic" <andrija.pa...@gmail.com> wrote: Hi guys, we have occasional but serious problem, that starts happening as it seems randomly (i.e. NOT under high load) - not ACS related afaik, purely KVM, but feedback is really welcomed. - VM is reachable in general from everywhere, but not reachable from specific IP address ?! - VM is NOT under high load, network traffic next to zero, same for CPU/disk... - We mitigate this problem by migrating VM away to another host, not much of a solution... Description of problem: We let ping from "problematic" source IP address to the problematic VM, and we capture traffic on KVM host where the problematic VM lives: - Tcpdump on VXLAN interface (physical incoming interface on the host) - we see packet fine - tcpdump on BRIDGE = we see packet fine - tcpdump on VNET = we DON'T see packet. In the scenario above, I need to say that : - we can tcpdump packets from other source IPs on the VNET interface just fine (as expected), so should also see this problematic source IP's packets - we can actually ping in oposite direction - from the problematic VM to the problematic "source" IP We checked everything possible, from bridge port forwarding, to mac-to-vtep mapping, to many other things, removed traffic shaping from VNET interface, no iptables/ebtables, no STP on bridge, remove and rejoin interfaces to bridge, destroy bridge and create manually on the fly, Problem is really crazy, and I can not explain it - no iptables, no ebtables for troubleshooting pruposes (on this host) and We mitigate this problem by migrating VM away to another host, not much of a solution... This is Ubuntu 14.04, Qemu 2.5 (libvirt 1.3.1), Stock kernel 3.16-xx, regular bridge (not OVS) Anyone else ever heard of such problem - this is not intermittent packet dropping, but complete blackout/packet drop in some way... Thanks, -- Andrija Panić dag.sonst...@shapeblue.com www.shapeblue.com 53 Chandos Place, Covent Garden, London WC2N 4HSUK @shapeblue