Thanks, Dave. Crystally clear! It is a deliberate design, indeed.

Regards,
Kingwel


-------- 原始邮件 --------
主题: RE: [vpp-dev] Question about vlib_next_frame_change_ownership
来自: "Dave Barach (dbarach)" <dbar...@cisco.com>
发至: 2019年1月25日 下午9:03
抄送: Kingwel Xie <kingwel....@ericsson.com>,vpp-dev <vpp-dev@lists.fd.io>
Dear Kingwei,

On a per-thread basis, only one input node is active at a time. In the 2x 
active input node case you sketched, the second input node will take frame 
ownership of the ethernet input frame - which should be pending but not yet 
dispatched - and add more buffer indices to it. It’s possible that the first 
frame will fill and that a second frame will be allocated and [partially] 
filled; that’s not the typical case.

One lap around the track later, the first input node will take back ownership.

Let’s say that input node 1 adds 50 packets to the ethernet-input frame, and 
input node 2 adds another 50 packets. The frame ownership change dance yields a 
vector of size 100 in ethernet-input. My guess is that the increase in 
efficiency from ethernet-input forward in the graph more than compensates for 
the fixed cost of the frame change ownership dance. This is to confirm what you 
wrote.

I usually call this effect “input aggregation.” Also holds true for the handoff 
node, especially when handing off frames from multiple threads to a single 
thread.

The alternative: dispatch two smaller frames instead of one big one. Doing that 
might not be awful if all input nodes produced a fair number of packets. The 
situation becomes awful when the 2..N input nodes produce very different 
numbers of packets, e.g. 99 and 1. Anyhow, the code doesn’t work that way and 
isn’t going to work that way, so I digress...

Thanks... Dave

From: Kingwel Xie <kingwel....@ericsson.com>
Sent: Friday, January 25, 2019 5:13 AM
To: Dave Barach (dbarach) <dbar...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Hi Dave,

After checking the code and some debug sessions, I realized where the bug is – 
crypto-input, which is calling vlib_trace_buffer() earlier than 
get_next_frame/put_next_frame. Therefore, the next_frame->flag is overwritten 
by get_next_frame/change_owenrship. I’ve made a patch for it: 
https://gerrit.fd.io/r/17079  appreciate your comments.

Again, about vlib_next_frame_change_ownership, as I understand, this mechanism 
will always try to enqueue buffers to the owner’s frame_index, so please 
consider this scenario:

We have 2 input nodes running at the same time – dpdk-input and avf-input, they 
will try to compete with each other to enqueue to ethernet-input. Consequently, 
bad side, memcpy per frame as we discussed, and, good side, assume they both 
have 10 buffers per frame, then ethernet-input will have chance to get 20 
buffers for batch processing, good for performance. Please confirm if this is 
expected behavior of frame ownership.

Regards,
Kingwel


From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Dave Barach via 
Lists.Fd.Io
Sent: Friday, January 25, 2019 4:52 AM
To: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>; 
vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership

The vpp packet trace which I extracted from your dispatch trace seems exactly 
as I would have expected. See below. In a pg test like this one using a 
loopback interface, anything past loopN-tx is irrelevant. The ipsec packet 
turns into an ARP request for 18.1.0.241.

In non-cyclic graph cases, we don’t end up changing frame ownership at all. In 
this case, you’re doing a double lookup. One small memcpy per frame is a 
trivial cost, especially when one remembers that the cost is amortized over all 
the packets in the frame.

Until you produce a repeatable demonstration of the claimed issue, there’s 
nothing that we can do.

Thanks... Dave

VPP Buffer Trace
    Trace:
    Trace: 00:00:53:959410: pg-input
    Trace:   stream ipsec0, 100 bytes, 0 sw_if_index
    Trace:   current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0
    Trace:   UDP: 192.168.2.255 -> 1.2.3.4
    Trace:     tos 0x00, ttl 64, length 28, checksum 0xb324
    Trace:     fragment id 0x0000
    Trace:   UDP: 4321 -> 1234
    Trace:     length 80, checksum 0x30d9
    Trace: 00:00:53:959426: ip4-input
    Trace:   UDP: 192.168.2.255 -> 1.2.3.4
    Trace:     tos 0x00, ttl 64, length 28, checksum 0xb324
    Trace:     fragment id 0x0000
    Trace:   UDP: 4321 -> 1234
    Trace:     length 80, checksum 0x30d9
    Trace: 00:00:53:959519: ip4-lookup
    Trace:   fib 0 dpo-idx 2 flow hash: 0x00000000
    Trace:   UDP: 192.168.2.255 -> 1.2.3.4
    Trace:     tos 0x00, ttl 64, length 28, checksum 0xb324
    Trace:     fragment id 0x0000
    Trace:   UDP: 4321 -> 1234
    Trace:     length 80, checksum 0x30d9
    Trace: 00:00:53:959598: ip4-rewrite
    Trace:   tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 
flow hash: 0x00000000
    Trace:   00000000: 
4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203
    Trace:   00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
    Trace: 00:00:53:959687: ipsec0-output
    Trace:   ipsec0
    Trace:   00000000: 
4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203
    Trace:   00000020: 
0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223
    Trace:   00000040: 
2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243
    Trace:   00000060: 44454647000000000000000000000000000000000000000000000000
    Trace: 00:00:53:959802: ipsec0-tx
    Trace:   IPSec: spi 1 seq 1
    Trace: 00:00:53:959934: esp4-encrypt
    Trace:   esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96
    Trace: 00:00:53:960084: ip4-lookup
    Trace:   fib 0 dpo-idx 0 flow hash: 0x00000000
    Trace:   IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
    Trace:     tos 0x00, ttl 254, length 168, checksum 0x96ea
    Trace:     fragment id 0x0000
    Trace: 00:00:53:960209: ip4-glean
    Trace:     IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
    Trace:       tos 0x00, ttl 254, length 168, checksum 0x96ea
    Trace:       fragment id 0x0000
    Trace: 00:00:53:960336: loop0-output
    Trace:   loop0
    Trace:   ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff
    Trace:   request, type ethernet/IP4, address size 6/4
    Trace:   de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241
    Trace: 00:00:53:960491: error-drop
    Trace:   ip4-glean: ARP requests sent


From: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>
Sent: Thursday, January 24, 2019 2:43 AM
To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; 
vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Ok. As requested, pcap trace & test script attached. Actually I made some 
simplification to indicate the problem – using native IPSEC instead of DPDK.

You can see in the buffer trace that ip-lookup is referred by ip-input in the 
beginning then by esp-encrypt later. It means the ownership of ip-lookup will 
be changed back and forth, 16x3=48 bytes memcpy, per frame basis. Under some 
case, the trace flag in next_frame will be lost, then it leads to buffer trace 
broken. I made a patch for further discussion about it:  
https://gerrit.fd.io/r/17037

Test log shown below:

DBGvpp# show version
vpp v19.04-rc0~24-g0702554 built by root on ubuntu89 at Sat Jan 19 22:13:50 EST 
2019
DBGvpp#
DBGvpp# exec ipsec
loop0
DBGvpp#
DBGvpp# pcap dispatch trace on max 1000 file vpp.pcap buffer-trace pg-input 10
Buffer tracing of 10 pkts from pg-input enabled...
pcap dispatch capture on...
DBGvpp#
DBGvpp#
DBGvpp# packet-generator enable-stream ipsec0
DBGvpp#
DBGvpp# pcap dispatch trace off
captured 14 pkts...
saved to /tmp/vpp.pcap...
DBGvpp#
DBGvpp# show trace
------------------- Start of thread 0 kw_main -------------------
Packet 1

00:00:53:959410: pg-input
  stream ipsec0, 100 bytes, 0 sw_if_index
  current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0
  UDP: 192.168.2.255 -> 1.2.3.4
    tos 0x00, ttl 64, length 28, checksum 0xb324
    fragment id 0x0000
  UDP: 4321 -> 1234
    length 80, checksum 0x30d9
00:00:53:959426: ip4-input
  UDP: 192.168.2.255 -> 1.2.3.4
    tos 0x00, ttl 64, length 28, checksum 0xb324
    fragment id 0x0000
  UDP: 4321 -> 1234
    length 80, checksum 0x30d9
00:00:53:959519: ip4-lookup
  fib 0 dpo-idx 2 flow hash: 0x00000000
  UDP: 192.168.2.255 -> 1.2.3.4
    tos 0x00, ttl 64, length 28, checksum 0xb324
    fragment id 0x0000
  UDP: 4321 -> 1234
    length 80, checksum 0x30d9
00:00:53:959598: ip4-rewrite
  tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 flow hash: 
0x00000000
  00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203
  00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f
00:00:53:959687: ipsec0-output
  ipsec0
  00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203
  00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223
  00000040: 2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243
  00000060: 44454647000000000000000000000000000000000000000000000000
00:00:53:959802: ipsec0-tx
  IPSec: spi 1 seq 1
00:00:53:959934: esp4-encrypt
  esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96
00:00:53:960084: ip4-lookup
  fib 0 dpo-idx 0 flow hash: 0x00000000
  IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
    tos 0x00, ttl 254, length 168, checksum 0x96ea
    fragment id 0x0000
00:00:53:960209: ip4-glean
    IPSEC_ESP: 18.1.0.71 -> 18.1.0.241
      tos 0x00, ttl 254, length 168, checksum 0x96ea
      fragment id 0x0000
00:00:53:960336: loop0-output
  loop0
  ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff
  request, type ethernet/IP4, address size 6/4
  de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241
00:00:53:960491: error-drop
  ip4-glean: ARP requests sent
00:00:53:960780: ethernet-input
  ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff
00:00:53:960927: arp-input
  request, type ethernet/IP4, address size 6/4
  de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241
00:00:53:961126: error-drop
  arp-input: IP4 source address matches local interface


From: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>
Sent: Wednesday, January 23, 2019 11:33 PM
To: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>; 
vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership

Please write up the issue and share the config and pg input script as I asked. 
You might find that the issue disappears pretty rapidly, with no further action 
on your part... (😉)...

The basic graph engine is not a place to start hacking based on “I think I get 
it...”

Thanks... Dave

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie
Sent: Wednesday, January 23, 2019 10:18 AM
To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; 
vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership

thanks. I think I get it. By maintaining the ownership, vPP is able to enqueue 
all buffers destinated to the same target node into the owner's next frame at 
one time. This avoids dispatching the node function multiple times.

The bug is still there. I will create a patch later for further discussion.

And, maybe there has some space to improve: considering we have two input nodes 
which will both add elements to the third node, we will see the ownership of 
this node being switched per frame basis.

- Kingwel

-------- 原始邮件 --------
主题: RE: Question about vlib_next_frame_change_ownership
来自: "Dave Barach (dbarach)" <dbar...@cisco.com<mailto:dbar...@cisco.com>>
发至: 2019年1月23日 下午8:49
抄送: Kingwel Xie 
<kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>,vpp-dev 
<vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>

As you've probably noticed, the buffer manager has been under active 
development. That may or may not have anything to do with the issue.



Please follow the bug reporting process: 
https://wiki.fd.io/view/VPP/BugReports. In this case, using master/latest, 
please create a Jira ticket including the exact configuration, packet generator 
input script, and a dispatch pcap trace:



  *   "pcap dispatch trace on file dtrace max 10000 buffer-trace pg-input 1000",
  *   start the pg stream
  *   "pcap dispatch trace off".
  *   Results in /tmp/dtrace.



I'm not going to speculate on what's going on at this point. Please write up 
the issue so we can look at it.



For a decent explanation of the frame ownership scheme, take a look at 
https://fdio-vpp.readthedocs.io/en/latest/gettingstarted/developers/vlib.html 
under "Complications".



HTH... Dave



-----Original Message-----
From: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>
Sent: Wednesday, January 23, 2019 2:16 AM
To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; 
vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>>
Subject: Question about vlib_next_frame_change_ownership



Hi Dave and all,



I'm looking at a buffer trace issue with DPDK IPSEC. It turns out the flag 
VLIB_FRAME_TRACE is broken in vlib_next_frame_change_ownership().



The node path in my setup is:  pg-input -> ip-input -> ip-lookup -> ... -> 
dkdp-esp-encrypt -> cryptodev -> crypto-input -> ip-lookup -> ...



As you can see, the ip-lookup node has the owner node ip-input in the 
beginning, then owner will be changed to crypto-input shortly. This change 
causes that we swap the current next_frame with the owner's in 
vlib_next_frame_change_ownership(). As a result, the VLIB_FRAME_TRACE in 
next_frame->flag will be overwritten.



The fix could be very simple, but I'm wondering why we have to change the 
ownership of the next_frame? Actually I can observe the ownership is changed 
back and forth between ip-input and crypto-input for every frame, which leads 
to performance degradation. However, it looks good to me even that we don’t 
care the ownership. In this case, ip-lookup will be dispatched by either 
ip-input or crypto-input, with different next_frame. I guess I must have missed 
something, appreciate if you can elaborate.



Regards,

Kingwel
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12011): https://lists.fd.io/g/vpp-dev/message/12011
Mute This Topic: https://lists.fd.io/mt/29430823/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to