Here is how I reproduce the bug:

Create 3 Ubuntu 16.04 VMs (VM-1, VM gateway, VM-2) on Azure in the same
Resource Group. The kernel should be the linux-azure kernel
4.15.0-1098.109~16.04.1 (or newer).  I use Gen1 VM but Gen2 should also
has the same issue; I use the "East US 2" region, but the issue should
reproduce in any region.

Note: none of the VMs use Accelerated Networking, i.e. all of the 3 VMs
use the software NIC "NetVSC".

In my setup, VM-1 and VM-2 are "Standard D8s v3 (8 vcpus, 32 GiB
memory)", and VM-gateway is "Standard DS3 v2 (4 vcpus, 14 GiB memory)".
I happen to name the gateway VM "decui-dpdk", but here actually DPDK is
not used at all (I do intend to use this setup for DPDK in future).


The gateway VM has 3 NICs:
        The main NIC (10.0.0.4) is not used in ip-forwarding.
        NIC-1's IP is 192.168.80.5
        NIC-2's IP is 192.168.81.5.
        The gateway VM receives packets from VM-1(192.168.80.4) and forwards 
the packets to VM-2 (192.168.81.4).
        No firewall rule is used.

The client VM (VM-1, 192.168.80.4) has 1 NIC. It's running iperf2 client. 
The server VM (VM-2, 192.168.81.4) has 1 NIC. It's running iperf2 server: 
"nohup iperf -s &"

The client VM is sending traffic, through the gateway VM (192.168.80.5, 
192.168.81.5), to the server VM.
Note: all the 3 subnets here are in the same VNET(Virtual Net) and 2 Azure UDR 
(User Defined Routing) rules must be used to force the traffic to go through 
the gateway VM. The IP-forwarding of the gateway VM's NIC-1 and NIC-2 must be 
enabled from Azure portal (the setting can only changed when the VM is 
"stopped"), and IP-forwarding must be enabled in the gateway VM (i.e. echo 1 > 
/proc/sys/net/ipv4/ip_forward). I'll attach some screenshots showing the 
network topology and the configuration.


iperf2 uses 512 TCP connections and I limit the bandwidth used by iperf to 
<=70% of the per-VM bandwith limit (Note: if the VM uses >70% of the limit, 
even with the 2 patches, the ping latency between VM-1 and VM-2 can still 
easily go very high, e.g. >200ms -- we'll try to further investigate that).


It looks the per-VM bandwithd limit of the gateway VM (DS3_v2) is 2.6Gbps, so 
70% of it is 1.8Gbps.

In the client VM, run something like:
    iperf -c 192.168.81.4 -b 3.5m -t 120 -P512
    (-b means the per-TCP-connection limit; -P512 means 512 connections, so the 
total throughput should be around 3.5*512 = 1792 Mbps; "-t 120" means the test 
lasts for 2 minutes. we can abort the test any time by Ctrl+C.)

In the "Server VM, run: 
    nohup iperf -s &    
    ping 192.168.80.4 (we can terminate the program by Ctrl+C), and observe the 
latency.

In the gateway VM, run "nload" to check the current throughput (if the
current device is not the NIC we want to check, press Right Arrow and
Left Allow), and run "top" to check the CPU utilization (when there are
512 connections, the utilization should be still low, e.g. <25%).

When the iperf2 test is running, the ping latency between VM-1 and VM-2
can easily exceed 100ms or even >300ms, but with the 2 patches applied,
the latency typically should be <20ms.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1902531

Title:
  [linux-azure] IP forwarding issue in netvsc

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1902531/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to