Hello VPP experts,

We are using VPP for NAT44 and we get some "congestion drops", in a
situation where we think VPP is far from overloaded in general. Then
we started to investigate if it would help to use a larger handoff
frame queue size. In theory at least, allowing a longer queue could
help avoiding drops in case of short spikes of traffic, or if it
happens that some worker thread is temporarily busy for whatever
reason.

The NAT worker handoff frame queue size is hard-coded in the
NAT_FQ_NELTS macro in src/plugins/nat/nat.h where the current value is
64. The idea is that putting a larger value there could help.

We have run some tests where we changed the NAT_FQ_NELTS value from 64
to a range of other values, each time rebuilding VPP and running an
identical test, a test case that is to some extent trying to mimic our
real traffic, although of course it is simplified. The test runs many
iperf3 tests simultaneously using TCP, combined with some UDP traffic
chosen to trigger VPP to create more new sessions (to make the NAT
"slowpath" happen more).

The following NAT_FQ_NELTS values were tested:
16
32
64  <-- current value
128
256
512
1024
2048  <-- best performance in our tests
4096
8192
16384
32768
65536
131072

In those tests, performance was very bad for the smallest NAT_FQ_NELTS
values of 16 and 32, while values larger than 64 gave improved
performance. The best results in terms of throughput were seen for
NAT_FQ_NELTS=2048. For even larger values than that, we got reduced
performance compared to the 2048 case.

The tests were done for VPP 20.05 running on a Ubuntu 18.04 server
with a 12-core Intel Xeon CPU and two Mellanox mlx5 network cards. The
number of NAT threads was 8 in some of the tests and 4 in some of the
tests.

According to these tests, the effect of changing NAT_FQ_NELTS can be
quite large. For example, for one test case chosen such that
congestion drops were a significant problem, the throughput increased
from about 43 to 90 Gbit/second with the amount of congestion drops
per second reduced to about one third. In another kind of test,
throughput increased by about 20% with congestion drops reduced to
zero. Of course such results depend a lot on how the tests are
constructed. But anyway, it seems clear that the choice of
NAT_FQ_NELTS value can be important and that increasing it would be
good, at least for the kind of usage we have tested now.

Based on the above, we are considering changing NAT_FQ_NELTS from 64
to a larger value and start trying that in our production environment
(so far we have only tried it in a test environment).

Were there specific reasons for setting NAT_FQ_NELTS to 64?

Are there some potential drawbacks or dangers of changing it to a
larger value?

Would you consider changing to a larger value in the official VPP
code?

Best regards,
Elias

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18012): https://lists.fd.io/g/vpp-dev/message/18012
Mute This Topic: https://lists.fd.io/mt/78230881/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

  • ... Elias Rudberg
    • ... Klement Sekera via lists.fd.io
      • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
      • ... Marcos - Mgiga
        • ... ksekera via []
          • ... Marcos - Mgiga
            • ... Klement Sekera via lists.fd.io
              • ... Marcos - Mgiga
                • ... Klement Sekera via lists.fd.io
            • ... Christian Hopps
              • ... Honnappa Nagarahalli

Reply via email to