Hello, I'm having some severe network performance problems with OpenStack Quantum.
I have a pretty normal Open vSwitch Quantum configuration using GRE tunnels. One thing to note is that I have limited hardware at this point. Rather than having dedicated controller and Quantum hosts, they are running on one host each as a separate libvirt VM. Let me be clear memory and CPU resources are not scarce at the host or VM level for Quantum. The host has 8 CPUs and load is ~2.3 and never spikes above 2.7. Quantum has 4 vCPUs and load is ~1.3 and doesn't spike above 2. This controller host has a single 1Gbps NIC with trunked VLANs and same for the compute hosts. I have six systems for testing: controller host (CH1), Quantum server VM (Q), compute node 1 (N1), compute node 2 (N2), instance 1 (IN1), and instance 2 (IN2). The instances are running on separate compute nodes. Here are some iperf results. CH1 <--> Q: 6.3 Gbps This communication happens over a Linux Bridge. CH1 <--> N1: 937Mbps This happens over the 1Gbps physical ethernet network. Q (GRE) <--> IN1: 451Mbps I ran iperf on Q using the qrouter Linux network namespace to test peformance impact of GRE tunnel. IN1 <--> IN2: 682Mbps Again testing GRE tunneling. The discrepancy from the previous test is interesting since it's the same basic test. The results above are not too bad. This is where things get interesting. Quantum is configured with one external (192.168.27.0/24) and one private network (10.10.1.0/24). IN1 has address 10.10.1.2 and floating IP 192.168.27.11 (the first few IPs are outside the allocation pool). I connected my laptop (1 Gbps) directly to the switch and assigned IP 192.168.27.2, so there wouldn't be any routing from the physical switch. Laptop <--> N1: 935Mbps Laptop <--> IN1: 26.7Mbps That is not a typo. Traffic going through the L3 agent slows by almost 17x (from the Q GRE to IN1 result). I regularly see results below 10Mbps. I'm having a real tough time troubleshooting the last test. I ran tcpdump from the host, CH1, and I don't see any errors causing TCP retransmission or duplicate packets. Both CH1 and Quantum server have plenty of CPU available. It's like the L3 iptables rules are massively decreasing performance, but I've used iptables for years in other capacities and haven't seen this sort of problem. The various Quantum logs don't indicate any problems. Has anyone else seen large performance decreases when using the Quantum L3 agent? Any ideas on how to troubleshoot this? Sincerely, Justin _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack