Hello!
Hardware:
4x core Intel 8705G (NUC) Kabylake core i7 (32K L1, 8M L3)
Intel 05:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network
Connection - 1 Gbe
I am currently experimenting with a plug-in that processes the payload of a
packet to do some crypto (ECDSA verification). I have 2 nodes in my plug-in:
1) A hand-off node (pinned to core 3) that enqueues entire frames of packets to
worker nodes on 2 other worker cores (1 & 2) using a frame queue of size 64
(16,384 buffers)
2) A worker node that receives a frame from the hand-off node, loops through
all packets (no pre-fetching), and starts a batch verification of the data.
I have a benchmark, where I feed packets from a PCAP file into the graph from a
remote NIC through DPDK (vfio-pci).
I am trying to understand the performance statistics from 'vppctl show run' for
my worker nodes. Here's what it looks like when I start a benchmark:
-
-
Thread 1 vpp_wk_0 (lcore 1)
Time *150.4* , average vectors/node 3.04, last 128 main loops 8.00 per node
256.00
vector rates in 0.e0, out 2.2369e4, drop 0.e0, punt 0.e0
Name State Calls Vectors Suspends
Clocks Vectors/Call
.
narf-worker active 3219952 3364368
0 *1.33e5* *1.04*
.
-
-
It reports an average of *133,000 ticks/packet* AND each vector contains only 1
packet on an average. I also have a memif interface as my next node in the
graph that timestamps verified packets
on the other end. It averages *~45,000 packets/second*.
If I stop the benchmark and restart it, without restarting VPP, I start to see
VPP batching more packets per vector (256, precisely) and the amortized cost
per packet goes down significantly to *65,900 ticks/packet*.
This benches *~90,000 packets/second* as expected (the packet processing is
expected to be 2x, if the size of the batch is > 64).
-
-
Thread 1 vpp_wk_0 (lcore 1)/
Time *401.1* , average vectors/node 256.00, last 128 main loops 0.00 per node
0.000
vector rates in 0.e0, out 4.6392e4, drop 0.e0, punt 0.e0
Name State Calls Vectors Suspends
Clocks Vectors/Call
.
narf-worker active 72679 18605824
0 *6.59e4* *256.00*
.
-
I also have a counter in my nodes that report no. of packets read off of the
NIC v/s no. of packets actually processed (difference being the packets that
could not be queued onto the hand-off frame queue).
>From the initial bench, I see a large difference between the packets enqueued
>and the packets received (because VPP is scheduling only 1 packet per frame).
-
410616 narf-worker NARF Transactions EC-verified
410013 narf-worker NARF Transactions EC-verified
820697 narf-handoff packets enqueued
2383254 narf-handoff packets received
-
With a restarted benchmark, the two counters are almost the same.
-
-
77669136 narf-worker NARF Transactions EC-verified
77670480 narf-worker NARF Transactions EC-verified
155340896 narf-handoff packets enqueued
167930211 narf-handoff packets received
-
-
Considering the frames received by the hand-off node are passed through to the
worker nodes as is, it seems that the thread polling the NIC using DPDK is
batching only 1 packet per frame initially. It then, switches to 256
packets/frame when I restart the workload.
Is there something wrong w/ my setup?
-- Alok
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#13422): https://lists.fd.io/g/vpp-dev/message/13422
Mute This Topic: https://lists.fd.io/mt/32281669/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-