Hello VPP experts, We are using VPP for NAT44 and we get some "congestion drops", in a situation where we think VPP is far from overloaded in general. Then we started to investigate if it would help to use a larger handoff frame queue size. In theory at least, allowing a longer queue could help avoiding drops in case of short spikes of traffic, or if it happens that some worker thread is temporarily busy for whatever reason.
The NAT worker handoff frame queue size is hard-coded in the NAT_FQ_NELTS macro in src/plugins/nat/nat.h where the current value is 64. The idea is that putting a larger value there could help. We have run some tests where we changed the NAT_FQ_NELTS value from 64 to a range of other values, each time rebuilding VPP and running an identical test, a test case that is to some extent trying to mimic our real traffic, although of course it is simplified. The test runs many iperf3 tests simultaneously using TCP, combined with some UDP traffic chosen to trigger VPP to create more new sessions (to make the NAT "slowpath" happen more). The following NAT_FQ_NELTS values were tested: 16 32 64 <-- current value 128 256 512 1024 2048 <-- best performance in our tests 4096 8192 16384 32768 65536 131072 In those tests, performance was very bad for the smallest NAT_FQ_NELTS values of 16 and 32, while values larger than 64 gave improved performance. The best results in terms of throughput were seen for NAT_FQ_NELTS=2048. For even larger values than that, we got reduced performance compared to the 2048 case. The tests were done for VPP 20.05 running on a Ubuntu 18.04 server with a 12-core Intel Xeon CPU and two Mellanox mlx5 network cards. The number of NAT threads was 8 in some of the tests and 4 in some of the tests. According to these tests, the effect of changing NAT_FQ_NELTS can be quite large. For example, for one test case chosen such that congestion drops were a significant problem, the throughput increased from about 43 to 90 Gbit/second with the amount of congestion drops per second reduced to about one third. In another kind of test, throughput increased by about 20% with congestion drops reduced to zero. Of course such results depend a lot on how the tests are constructed. But anyway, it seems clear that the choice of NAT_FQ_NELTS value can be important and that increasing it would be good, at least for the kind of usage we have tested now. Based on the above, we are considering changing NAT_FQ_NELTS from 64 to a larger value and start trying that in our production environment (so far we have only tried it in a test environment). Were there specific reasons for setting NAT_FQ_NELTS to 64? Are there some potential drawbacks or dangers of changing it to a larger value? Would you consider changing to a larger value in the official VPP code? Best regards, Elias
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#18012): https://lists.fd.io/g/vpp-dev/message/18012 Mute This Topic: https://lists.fd.io/mt/78230881/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-