Makes sense, thanks Steven.

One more round of questions -- I expected the numbers I got between the two VMs 
(~2gpbs) given that I had just a single core running for VPP.  I went ahead and 
amended my startup.conf in order to make use of 2 and then again as 4 worker 
threads, all within the same socket.

After booting the VMs and testing basic connectivity (ping!), I begin to either 
run ab and nginx, or just iperf between the VMs.  In either case, in short time 
VPP crashes.  Does this ring a bell?  I am still ramping on VPP and understand 
I likely am making some assumptions that are wrong.    Guidance?

With two workers:
Apr 20 17:17:03 eernstworkstation systemd[1]: 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: 
Job 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start
 timed out.
Apr 20 17:17:03 eernstworkstation systemd[1]: Timed out waiting for device 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device.
Apr 20 17:17:03 eernstworkstation systemd[1]: Dependency failed for 
/dev/disk/by-uuid/def55f66-6b20-47c6-a02f-bdaf324ed3b7.
Apr 20 17:17:03 eernstworkstation systemd[1]: 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap: Job 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.swap/start 
failed with result 'dependenc
Apr 20 17:17:03 eernstworkstation systemd[1]: 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device: 
Job 
dev-disk-by\x2duuid-def55f66\x2d6b20\x2d47c6\x2da02f\x2dbdaf324ed3b7.device/start
 failed with result 'timeo
Apr 20 17:17:06 eernstworkstation vpp[38637]: /usr/bin/vpp[38637]: received 
signal SIGSEGV, PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770
Apr 20 17:17:06 eernstworkstation /usr/bin/vpp[38637]: received signal SIGSEGV, 
PC 0x7f0d02b5b49c, faulting address 0x7f1cc12f5770
Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Main process exited, 
code=killed, status=6/ABRT
Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Unit entered failed 
state.
Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Failed with result 
'signal'.
Apr 20 17:17:06 eernstworkstation systemd[1]: vpp.service: Service hold-off 
time over, scheduling restart.

Apr 20 17:17:06 eernstworkstation systemd[1]: Stopped vector packet processing 
engine.



-----Original Message-----
From: Steven Luong (sluong) [mailto:slu...@cisco.com] 
Sent: Thursday, April 20, 2017 4:33 PM
To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com>
Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04?

Eric,

In my testing, I notice my number is 2 to 3X better when coalesce is disabled. 
I am using Ivy Bridge. So it looks like the mileage varies a lot with Sandy 
Bridge, 40X better.

What is coalesce?
When the driver places descriptors into the vring, it may request interrupt or 
no interrupt after the device is done processing with the descriptors. If the 
driver wants interrupt, the device may send it immediately if coalesce is not 
enabled. If it is enabled, the device will delay posting the interrupt until 
more descriptors are received to meet the coalesce number. This is an attempt 
to reduce the number of interrupts generated to the driver. My guess is when 
coalesce is enabled, the application, iperf3 in this case, is not shooting 
packets as fast as it can until it receives the interrupt for the packets sent. 
Thus the total bandwidth number looks bad. By disabling coalesce, the 
application is shooting a lot more packets in the interval at the expense of 
more interrupts are generated in the VM.

I don’t know why coalesce is enabled by default. This was done before I was 
born. Damjan or others may chime in for this and the answer for 2) as well. 
Show errors is all I know.

Steven

On 4/20/17, 3:54 PM, "Ernst, Eric" <eric.er...@intel.com> wrote:

    Steven,
    
    Thanks for the help.  As before, setup is described @ 
https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545 (updated since 
I no longer am using the evil feature mask).
    
    I'm going to need to read up on what coalesce frames setting is doing .... 
    
    Without that set, you can find my output from iperf3 appended.  No 
retransmissions in the output, and no errors observed on VPP side (that is, 
nothing notable in systemctl status vpp).
    
    When I set coalesce frames I see *major* improvements -- getting in the 
ballbark of what I would expect for a single thread; about 2 gbps.  Phew -a 
major relief .   Couple things:
    1)  So, can you  tell me more about what this is doing, and why this isn't 
enabled by default.
    2) Is there a straight forward way to monitor VPP setup (particular 
counters) to identify where the issue is?
    
    Thanks again!
    
    Cheers,
    Eric
    
    -------
    *Server*:
    # iperf3 -s
    -----------------------------------------------------------
    Server listening on 5201
    -----------------------------------------------------------
    Accepted connection from 192.168.0.2, port 41058
    [  5] local 192.168.0.1 port 5201 connected to 192.168.0.2 port 41060
    [ ID] Interval           Transfer     Bandwidth
    [  5]   0.00-1.00   sec  12.8 MBytes   107 Mbits/sec
    [  5]   1.00-2.00   sec  7.93 MBytes  66.5 Mbits/sec
    [  5]   2.00-3.00   sec  7.94 MBytes  66.6 Mbits/sec
    [  5]   3.00-4.00   sec  5.37 MBytes  45.0 Mbits/sec
    [  5]   4.00-5.00   sec  5.29 MBytes  44.4 Mbits/sec
    [  5]   5.00-6.00   sec  4.28 MBytes  35.9 Mbits/sec
    [  5]   6.00-7.00   sec  4.14 MBytes  34.8 Mbits/sec
    [  5]   7.00-8.00   sec  4.14 MBytes  34.7 Mbits/sec
    [  5]   8.00-9.00   sec  4.14 MBytes  34.8 Mbits/sec
    [  5]   9.00-10.00  sec  4.14 MBytes  34.7 Mbits/sec
    [  5]  10.00-10.03  sec   133 KBytes  34.9 Mbits/sec
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth
    [  5]   0.00-10.03  sec  0.00 Bytes  0.00 bits/sec                  sender
    [  5]   0.00-10.03  sec  60.3 MBytes  50.4 Mbits/sec                  
receiver
    -----------------------------------------------------------
    Server listening on 5201
    -----------------------------------------------------------
    
    *Client*:
    # iperf3 -c 192.168.0.1
    Connecting to host 192.168.0.1, port 5201
    [  4] local 192.168.0.2 port 41060 connected to 192.168.0.1 port 5201
    [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
    [  4]   0.00-1.00   sec  13.8 MBytes   116 Mbits/sec    0   8.48 KBytes
    [  4]   1.00-2.00   sec  8.05 MBytes  67.5 Mbits/sec    0   8.48 KBytes
    [  4]   2.00-3.00   sec  7.74 MBytes  64.9 Mbits/sec    0   8.48 KBytes
    [  4]   3.00-4.00   sec  5.28 MBytes  44.3 Mbits/sec    0   5.66 KBytes
    [  4]   4.00-5.00   sec  5.28 MBytes  44.3 Mbits/sec    0   5.66 KBytes
    [  4]   5.00-6.00   sec  4.35 MBytes  36.5 Mbits/sec    0   5.66 KBytes
    [  4]   6.00-7.00   sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
    [  4]   7.00-8.00   sec  4.35 MBytes  36.5 Mbits/sec    0   5.66 KBytes
    [  4]   8.00-9.00   sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
    [  4]   9.00-10.00  sec  4.04 MBytes  33.9 Mbits/sec    0   5.66 KBytes
    - - - - - - - - - - - - - - - - - - - - - - - - -
    [ ID] Interval           Transfer     Bandwidth       Retr
    [  4]   0.00-10.00  sec  61.0 MBytes  51.2 Mbits/sec    0             sender
    [  4]   0.00-10.00  sec  60.3 MBytes  50.6 Mbits/sec                  
receiver
    
    iperf Done.
    -----
    
    
    
    
    
    
    
    
    -----Original Message-----
    From: Steven Luong (sluong) [mailto:slu...@cisco.com] 
    Sent: Thursday, April 20, 2017 3:05 PM
    To: Ernst, Eric <eric.er...@intel.com>; Billy McFall <bmcf...@redhat.com>
    Cc: Damjan Marion (damarion) <damar...@cisco.com>; vpp-dev@lists.fd.io
    Subject: Re: [vpp-dev] Connectivity issue when using vhost-user on 17.04?
    
    Eric,
    
    As a first step, please share the output of iperf3 to see how many 
retransmissions that you have for the run. From VPP, please collect show errors 
to see if vhost drops anything. As an additional data point for comparison, 
please also try disabling vhost coalesce to see if you get better result by 
adding the following configuration to /etc/vpp/startup.conf
    
    vhost-user {
      coalesce-frames 0
    }
    
    Steven
    
    On 4/20/17, 2:19 PM, "vpp-dev-boun...@lists.fd.io on behalf of Ernst, Eric" 
<vpp-dev-boun...@lists.fd.io on behalf of eric.er...@intel.com> wrote:
    
        Thanks Billy - it was through some examples that i had found that I 
ended up
        grabbing that.  I reinstalled 1704 and can verify connectivity when 
removing the
        evil feature-mask.
        
        Thanks for the quick feedback, Damjan.  If we could only go back in 
time!  
        
        Now if I could just figure out why I'm getting capped bandwidth (via 
iperf)
        of ~45 mbps between two VMs on the same socket on a sandybridge xeon, I 
will
        be really happy!  If anyone has suggestions on debug methods for this, 
it'd be
        appreciated.  I see a huge difference when switching to ovs vhost-user, 
keeping
        all else the same.
        
        --Eric
        
        
        On Thu, Apr 20, 2017 at 04:29:23PM -0400, Billy McFall wrote:
        > The vHost examples on the Wiki used the feature-mask of 0xFF. I think 
that
        > is how it got propagated. In 16.09 when I did the CLI documentation 
for the
        > vHost, I expanded what the bits meant and used feature-mask 
0x40400000 as
        > the example. I will gladly add an additional comment indicating that 
the
        > recommended use is to leave blank if this was intended to be debug.
        > 
        > https://docs.fd.io/vpp/17.07/clicmd_src_vnet_devices_virtio.html
        > 
        > Billy
        > 
        > On Thu, Apr 20, 2017 at 4:17 PM, Damjan Marion (damarion) <
        > damar...@cisco.com> wrote:
        > 
        > >
        > > Eric,
        > >
        > > long time ago ( i think 3+ years) when I wrote original vhost-user 
driver
        > > in vpp,
        > > I added feature-mask knob to cli which messes up with feature bitmap
        > > purely for debugging
        > > reasons.
        > >
        > > And I regret many times…
        > >
        > > Somebody dig it out and documented it somewhere, for to me unknown 
reasons.
        > > Now it spreads like a virus and I cannot stop it :)
        > >
        > > So please don’t use it, it is evil….
        > >
        > > Thanks,
        > >
        > > Damjan
        > >
        > > > On 20 Apr 2017, at 20:49, Ernst, Eric <eric.er...@intel.com> 
wrote:
        > > >
        > > > All,
        > > >
        > > > After updating the startup.conf to not reference DPDK, per 
direction in
        > > release
        > > > notification thread, I was able to startup vpp and create 
interfaces.
        > > >
        > > > Now that I'm testing, I noticed that I can no longer ping between 
VM
        > > hosts which
        > > > make use of vhost-user interfaces and are connected via l2 bridge 
domain
        > > > (nor l2 xconnect).  I double checked, then reverted back to 
17.01, where
        > > I could
        > > > again verify connectivity between the guests.
        > > >
        > > > Any else seeing this, or was there a change in how this should be 
set
        > > up?  For
        > > > reference, I have my (simple) setup described @ a gist at [1].
        > > >
        > > > Thanks,
        > > > eric
        > > >
        > > >
        > > > [1] - 
https://gist.github.com/egernst/5982ae6f0590cd83330faafacc3fd545
        > > > _______________________________________________
        > > > vpp-dev mailing list
        > > > vpp-dev@lists.fd.io
        > > > https://lists.fd.io/mailman/listinfo/vpp-dev
        > >
        > > _______________________________________________
        > > vpp-dev mailing list
        > > vpp-dev@lists.fd.io
        > > https://lists.fd.io/mailman/listinfo/vpp-dev
        > 
        > 
        > 
        > 
        > -- 
        > *Billy McFall*
        > SDN Group
        > Office of Technology
        > *Red Hat*
        _______________________________________________
        vpp-dev mailing list
        vpp-dev@lists.fd.io
        https://lists.fd.io/mailman/listinfo/vpp-dev
    
    

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to