[dpdk-dev] [dpdk-users] FDir flex filters on XL710/i40e NICs with the new filter API

2016-08-19 Thread Paul Emmerich
ignore them*. Is this a bug? Or are we using something wrong? (The filter API doesn't seem to be very well documented) Paul Paul Emmerich: > Hi, > > we are trying to use flex filters to match on payload bytes and our code > stopped working with the new Filter API afte

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2016-02-15 Thread Paul Emmerich
Hi, here's a kind of late follow-up. I've only recently found the need (mostly for the better support of XL710 NICs (which I still dislike but people are using them...)) to seriously address DPDK 2.x support in MoonGen. On 13.05.15 11:03, Ananyev, Konstantin wrote: > Before start to discuss you

[dpdk-dev] Per-queue bandwidth limit on XL710 NICs?

2016-02-15 Thread Paul Emmerich
Hi, I'm using the per-queue rate control feature found in ixgbe-style NICs (rte_eth_set_queue_rate_limit) quite extensively in my packet generator MoonGen. I've read some parts of the XL710 datasheet and I guess it should be possible to implement this for this chip. I think there are two ways

[dpdk-dev] [PATCH 3/3] i40e: use crc checksum disable flag

2016-02-15 Thread Paul Emmerich
Signed-off-by: Paul Emmerich --- drivers/net/i40e/i40e_rxtx.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 40cffc1..52f7955 100644 --- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e

[dpdk-dev] [PATCH 2/3] ixgbe: use crc checksum disable flag

2016-02-15 Thread Paul Emmerich
Signed-off-by: Paul Emmerich --- lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c b/lib/librte_pmd_ixgbe/ixgbe_rxtx.c index 57c9430..800e224 100644 --- a/lib/librte_pmd_ixgbe/ixgbe_rxtx.c +++ b/lib

[dpdk-dev] [PATCH 1/3] add tx crc disable flag

2016-02-15 Thread Paul Emmerich
Signed-off-by: Paul Emmerich --- lib/librte_mbuf/rte_mbuf.c | 1 + lib/librte_mbuf/rte_mbuf.h | 6 ++ 2 files changed, 7 insertions(+) diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c index f506517..744fb4e 100644 --- a/lib/librte_mbuf/rte_mbuf.c +++ b/lib/librte_mbuf

[dpdk-dev] [PATCH 0/3] add flag to disable CRC checksum offloading

2016-02-15 Thread Paul Emmerich
This patch adds a new tx checksum offloading flag: PKT_TX_NO_CRC_CSUM. This allows disabling CRC checksum offloading on a per-packet basis. Doing this can be useful if you want to send out invalid packets on purpose, e.g. in a packet generator/test framework. Paul Emmerich (3): add tx crc

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-12 Thread Paul Emmerich
Found a really simple solution that almost restores the original performance: just add a prefetch on alloc. For some reason, I assumed that this was already done since the troublesome commit I investigated mentioned something about prefetching... I guess the commit referred to the hardware pref

[dpdk-dev] [PATCH] prefetch second cacheline of mbufs on alloc

2015-05-12 Thread Paul Emmerich
this improves the throughput of a simple tx-only application by 11% in the full-featured ixgbe tx path and by 14% in the simple tx path. --- lib/librte_mbuf/rte_mbuf.h | 1 + 1 file changed, 1 insertion(+) diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h index ab6de67..f6895b4

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-12 Thread Paul Emmerich
Paul Emmerich: > I naively tried to move the pool pointer into the first cache line in > the v2.0.0 tag and the performance actually decreased, I'm not yet sure > why this happens. There are probably assumptions about the cacheline > locations and prefetching in the code that

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-11 Thread Paul Emmerich
Hi Luke, thanks for your suggestion, I actually looked at how your packet generator in SnabbSwitch works before and it's quite clever. But unfortunately that's not what I'm looking for. I'm looking for a generic solution that works with whatever NIC is supported by DPDK and I don't want to wri

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-11 Thread Paul Emmerich
Hi, this is a follow-up to my post from 3 weeks ago [1]. I'm starting a new thread here since I now got a completely new test setup for improved reproducibility. Background for anyone that didn't catch my last post: I'm investigating a performance regression in my packet generator [2] that occ

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich
Hi, De Lara Guarch, Pablo : > Could you tell me which changes you made here? I see you are using simple tx > code path on 1.8.0, > but with the default values, you should be using vector tx, > unless you have changed anything in the tx configuration. sorry, I might have written that down wron

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich
Hi, Matthew Hall : > Not sure if it's relevant or not, but there was another mail claiming PCIe > MSI-X wasn't necessarily working in DPDK 2.x. Not sure if that could be > causing slowdowns when there are drastic volumes of 64-byte packets causing a > lot of PCI activity. Interrupts should not

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich
Hi, sorry, I mixed up the hardware I used for my tests. Paul Emmerich : > CPU: Intel(R) Xeon(R) CPU E3-1230 v2 > TurboBoost and HyperThreading disabled. > Frequency fixed at 3.30 GHz via acpi_cpufreq. The CPU frequency was fixed at 1.60 GHz to enforce a CPU bottleneck. My original

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-28 Thread Paul Emmerich
Hi, Pablo : > Could you tell me how you got the L1 cache miss ratio? Perf? perf stat -e L1-dcache-loads,L1-dcache-misses l2fwd ... > Could you provide more information on how you run the l2fwd app, > in order to try to reproduce the issue: > - L2fwd Command line ./build/l2fwd -c 3 -n 2 -- -p 3

[dpdk-dev] Performance regression in DPDK 1.8/2.0

2015-04-26 Thread Paul Emmerich
Hi, I'm working on a DPDK-based packet generator [1] and I recently tried to upgrade from DPDK 1.7.1 to 2.0.0. However, I noticed that DPDK 1.7.1 is about 25% faster than 2.0.0 for my use case. So I ran some basic performance tests on the l2fwd example with DPDK 1.7.1, 1.8.0 and 2.0.0. I used an

[dpdk-dev] Polling too often at lower packet rates?

2015-04-10 Thread Paul Emmerich
Paul Emmerich wrote: > Stephen Hemminger wrote: > >> Your excess polling consumes PCI bandwidth which is a fixed resource. > > I doubt that this is the problem for three reasons: > 4th: polling should not cause a PCIe access as all the required information is written

[dpdk-dev] Polling too often at lower packet rates?

2015-04-10 Thread Paul Emmerich
Stephen Hemminger wrote: > Your excess polling consumes PCI bandwidth which is a fixed resource. I doubt that this is the problem for three reasons: * The poll rate would regulate itself if the PCIe bus was the bottleneck * This problem only occurs with 82599 chips, not with X540 chips (which