Makes sense, thanks Steven. One more round of questions -- I expected the numbers I got between the two VMs (~2gpbs) given that I had just a single core running for VPP. I went ahead and amended my startup.conf in order to make use of 2 and then again as 4 worker threads, all within the same socket.
After booting the VMs and testing basic connectivity (ping!), I begin to either run ab and nginx, or just iperf between the VMs. In either case, in short time VPP crashes. Does this ring a bell? I am still ramping on VPP and understand I likely am making some assumptions that are wrong. Guidance? With two workers: Apr 20 17:17:03 eernstworkstation systemd[1]: dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: Job dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start timed out. Apr 20 17:17:03 eernstworkstation systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device. Apr 20 17:17:03 eernstworkstation systemd[1]: Dependency failed for /dev/disk/by-uuid/def55f66-6b20-47c6-a02f-bdaf324ed3b7. Apr 20 17:17:03 eernstworkstation systemd[1]: dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap: Job dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap/start failed with result 'dependenc Apr 20 17:17:03 eernstworkstation systemd[1]: dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: Job dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start failed with result 'timeo Apr 20 17:17:06 eernstworkstation vpp[38637]: /usr/bin/vpp[38637]: received signal SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770 Apr 20 17:17:06 eernstworkstation /usr/bin/vpp[38637]: received signal SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770 Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Main process exited, code=killed, status=6/ABRT Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Unit entered failed state. Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Failed with result 'signal'. Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Service hold-off time over, scheduling restart. Apr 20 17:17:06 eernstworkstation systemd[1]: Stopped vector packet processing engine. -----Original Message----- From: Steven Luong (sluong) [mailto:slu...@cisco.com] Sent: Thursday, April 20, 2017 4:33 PM To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com> Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04? Eric, In my testing, I notice my number is 2 to 3X better when coalesce is disabled. I am using Ivy Bridge. So it looks like the mileage varies a lot with Sandy Bridge, 40X better. What is coalesce? When the driver places descriptors into the vring, it may request interrupt or no interrupt after the device is done processing with the descriptors. If the driver wants interrupt, the device may send it immediately if coalesce is not enabled. If it is enabled, the device will delay posting the interrupt until more descriptors are received to meet the coalesce number. This is an attempt to reduce the number of interrupts generated to the driver. My guess is when coalesce is enabled, the application, iperf3 in this case, is not shooting packets as fast as it can until it receives the interrupt for the packets sent. Thus the total bandwidth number looks bad. By disabling coalesce, the application is shooting a lot more packets in the interval at the expense of more interrupts are generated in the VM. I don’t know why coalesce is enabled by default. This was done before I was born. Damjan or others may chime in for this and the answer for 2) as well. Show errors is all I know. Steven On 4/20/17, 3:54 PM, "Ernst, Eric" <eric.er...@intel.com> wrote: Steven, Thanks for the help. As before, setup is described @ https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 (updated since I no longer am using the evil feature mask). I'm going to need to read up on what coalesce frames setting is doing .... Without that set, you can find my output from iperf3 appended. No retransmissions in the output, and no errors observed on VPP side (that is, nothing notable in systemctl status vpp). When I set coalesce frames I see *major* improvements -- getting in the ballbark of what I would expect for a single thread; about 2 gbps. Phew -a major relief . Couple things: 1) So, can you tell me more about what this is doing, and why this isn't enabled by default. 2) Is there a straight forward way to monitor VPP setup (particular counters) to identify where the issue is? Thanks again! Cheers, Eric ------- *Server*: # iperf3 -s ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.0.2, port 41058 [ 5] local 192.168.0.1 port 5201 connected to 192.168.0.2 port 41060 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 12.8 MBytes 107 Mbits/sec [ 5] 1.00-2.00 sec 7.93 MBytes 66.5 Mbits/sec [ 5] 2.00-3.00 sec 7.94 MBytes 66.6 Mbits/sec [ 5] 3.00-4.00 sec 5.37 MBytes 45.0 Mbits/sec [ 5] 4.00-5.00 sec 5.29 MBytes 44.4 Mbits/sec [ 5] 5.00-6.00 sec 4.28 MBytes 35.9 Mbits/sec [ 5] 6.00-7.00 sec 4.14 MBytes 34.8 Mbits/sec [ 5] 7.00-8.00 sec 4.14 MBytes 34.7 Mbits/sec [ 5] 8.00-9.00 sec 4.14 MBytes 34.8 Mbits/sec [ 5] 9.00-10.00 sec 4.14 MBytes 34.7 Mbits/sec [ 5] 10.00-10.03 sec 133 KBytes 34.9 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.03 sec 0.00 Bytes 0.00 bits/sec sender [ 5] 0.00-10.03 sec 60.3 MBytes 50.4 Mbits/sec receiver ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- *Client*: # iperf3 -c 192.168.0.1 Connecting to host 192.168.0.1, port 5201 [ 4] local 192.168.0.2 port 41060 connected to 192.168.0.1 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 13.8 MBytes 116 Mbits/sec 0 8.48 KBytes [ 4] 1.00-2.00 sec 8.05 MBytes 67.5 Mbits/sec 0 8.48 KBytes [ 4] 2.00-3.00 sec 7.74 MBytes 64.9 Mbits/sec 0 8.48 KBytes [ 4] 3.00-4.00 sec 5.28 MBytes 44.3 Mbits/sec 0 5.66 KBytes [ 4] 4.00-5.00 sec 5.28 MBytes 44.3 Mbits/sec 0 5.66 KBytes [ 4] 5.00-6.00 sec 4.35 MBytes 36.5 Mbits/sec 0 5.66 KBytes [ 4] 6.00-7.00 sec 4.04 MBytes 33.9 Mbits/sec 0 5.66 KBytes [ 4] 7.00-8.00 sec 4.35 MBytes 36.5 Mbits/sec 0 5.66 KBytes [ 4] 8.00-9.00 sec 4.04 MBytes 33.9 Mbits/sec 0 5.66 KBytes [ 4] 9.00-10.00 sec 4.04 MBytes 33.9 Mbits/sec 0 5.66 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 61.0 MBytes 51.2 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 60.3 MBytes 50.6 Mbits/sec receiver iperf Done. ----- -----Original Message----- From: Steven Luong (sluong) [mailto:slu...@cisco.com] Sent: Thursday, April 20, 2017 3:05 PM To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com> Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04? Eric, As a first step, please share the output of iperf3 to see how many retransmissions that you have for the run. From VPP, please collect show errors to see if vhost drops anything. As an additional data point for comparison, please also try disabling vhost coalesce to see if you get better result by adding the following configuration to /etc/vpp/startup.conf vhost-user { coalesce-frames 0 } Steven On 4/20/17, 2:19 PM, "vpp-dev-boun...@lists.fd.io on behalf of Ernst, Eric" <vpp-dev-boun...@lists.fd.io on behalf of eric.er...@intel.com> wrote: Thanks Billy - it was through some examples that i had found that I ended up grabbing that. I reinstalled 1704 and can verify connectivity when removing the evil feature-mask. Thanks for the quick feedback, Damjan. If we could only go back in time! Now if I could just figure out why I'm getting capped bandwidth (via iperf) of ~45 mbps between two VMs on the same socket on a sandybridge xeon, I will be really happy! If anyone has suggestions on debug methods for this, it'd be appreciated. I see a huge difference when switching to ovs vhost-user, keeping all else the same. --Eric On Thu, Apr 20, 2017 at 04:29:23PM -0400, Billy McFall wrote: > The vHost examples on the Wiki used the feature-mask of 0xFF. I think that > is how it got propagated. In 16.09 when I did the CLI documentation for the > vHost, I expanded what the bits meant and used feature-mask 0x40400000 as > the example. I will gladly add an additional comment indicating that the > recommended use is to leave blank if this was intended to be debug. > > https://docs.fd.io/vpp/17.07/clicmd_src_vnet_devices_virtio.html > > Billy > > On Thu, Apr 20, 2017 at 4:17 PM, Damjan Marion (damarion) < > damar...@cisco.com> wrote: > > > > > Eric, > > > > long time ago ( i think 3+ years) when I wrote original vhost-user driver > > in vpp, > > I added feature-mask knob to cli which messes up with feature bitmap > > purely for debugging > > reasons. > > > > And I regret many times… > > > > Somebody dig it out and documented it somewhere, for to me unknown reasons. > > Now it spreads like a virus and I cannot stop it :) > > > > So please don’t use it, it is evil…. > > > > Thanks, > > > > Damjan > > > > > On 20 Apr 2017, at 20:49, Ernst, Eric <eric.er...@intel.com> wrote: > > > > > > All, > > > > > > After updating the startup.conf to not reference DPDK, per direction in > > release > > > notification thread, I was able to startup vpp and create interfaces. > > > > > > Now that I'm testing, I noticed that I can no longer ping between VM > > hosts which > > > make use of vhost-user interfaces and are connected via l2 bridge domain > > > (nor l2 xconnect). I double checked, then reverted back to 17.01, where > > I could > > > again verify connectivity between the guests. > > > > > > Any else seeing this, or was there a change in how this should be set > > up? For > > > reference, I have my (simple) setup described @ a gist at [1]. > > > > > > Thanks, > > > eric > > > > > > > > > [1] - https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 > > > _______________________________________________ > > > vpp-dev mailing list > > > vpp-dev@lists.fd.io > > > https://lists.fd.io/mailman/listinfo/vpp-dev > > > > _______________________________________________ > > vpp-dev mailing list > > vpp-dev@lists.fd.io > > https://lists.fd.io/mailman/listinfo/vpp-dev > > > > > -- > *Billy McFall* > SDN Group > Office of Technology > *Red Hat* _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev