Thanks, you see reducing the number of VPP threads as an option to work this 
issue around, since you would probably increase the vector rate per thread?

Best Regards

-----Mensagem original-----
De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de Klement Sekera via 
lists.fd.io
Enviada em: sexta-feira, 13 de novembro de 2020 14:26
Para: Marcos - Mgiga <mar...@mgiga.com.br>
Cc: Elias Rudberg <elias.rudb...@bahnhof.net>; vpp-dev <vpp-dev@lists.fd.io>
Assunto: Re: RES: RES: [vpp-dev] Increasing NAT worker handoff frame queue size 
NAT_FQ_NELTS to avoid congestion drops?

I used the usual

1. start traffic
2. clear run
3. wait n seconds (e.g. n == 10)
4. show run

Klement

> On 13 Nov 2020, at 18:21, Marcos - Mgiga <mar...@mgiga.com.br> wrote:
> 
> Understood. And what path did you take in order to analyse and monitor vector 
> rates ? Is there some specific command or log ?
> 
> Thanks
> 
> Marcos
> 
> -----Mensagem original-----
> De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de ksekera via 
> [] Enviada em: sexta-feira, 13 de novembro de 2020 14:02
> Para: Marcos - Mgiga <mar...@mgiga.com.br>
> Cc: Elias Rudberg <elias.rudb...@bahnhof.net>; vpp-dev@lists.fd.io
> Assunto: Re: RES: [vpp-dev] Increasing NAT worker handoff frame queue size 
> NAT_FQ_NELTS to avoid congestion drops?
> 
> Not completely idle, more like medium load. Vector rates at which I saw 
> congestion drops were roughly 40 for thread doing no work (just handoffs - I 
> hardcoded it this way for test purpose), and roughly 100 for thread picking 
> the packets doing NAT.
> 
> What got me into infra investigation was the fact that once I was hitting 
> vector rates around 255, I did see packet drops, but no congestion drops.
> 
> HTH,
> Klement
> 
>> On 13 Nov 2020, at 17:51, Marcos - Mgiga <mar...@mgiga.com.br> wrote:
>> 
>> So you mean that this situation ( congestion drops) is most likely to occur 
>> when the system in general is idle than when it is processing a large amount 
>> of traffic?
>> 
>> Best Regards
>> 
>> Marcos
>> 
>> -----Mensagem original-----
>> De: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> Em nome de Klement 
>> Sekera via lists.fd.io Enviada em: sexta-feira, 13 de novembro de 
>> 2020
>> 12:15
>> Para: Elias Rudberg <elias.rudb...@bahnhof.net>
>> Cc: vpp-dev@lists.fd.io
>> Assunto: Re: [vpp-dev] Increasing NAT worker handoff frame queue size 
>> NAT_FQ_NELTS to avoid congestion drops?
>> 
>> Hi Elias,
>> 
>> I’ve already debugged this and came to the conclusion that it’s the infra 
>> which is the weak link. I was seeing congestion drops at mild load, but not 
>> at full load. Issue is that with handoff, there is uneven workload. For 
>> simplicity’s sake, just consider thread 1 handing off all the traffic to 
>> thread 2. What happens is that for thread 1, the job is much easier, it just 
>> does some ip4 parsing and then hands packet to thread 2, which actually does 
>> the heavy lifting of hash inserts/lookups/translation etc. 64 element queue 
>> can hold 64 frames, one extreme is 64 1-packet frames, totalling 64 packets, 
>> other extreme is 64 255-packet frames, totalling ~16k packets. What happens 
>> is this: thread 1 is mostly idle and just picking a few packets from NIC and 
>> every one of these small frames creates an entry in the handoff queue. Now 
>> thread 2 picks one element from the handoff queue and deals with it before 
>> picking another one. If the queue has only 3-packet or 10-packet elements, 
>> then thread 2 can never really get into what VPP excels in - bulk processing.
>> 
>> Q: Why doesn’t it pick as many packets as possible from the handoff queue? 
>> A: It’s not implemented.
>> 
>> I already wrote a patch for it, which made all congestion drops which I saw 
>> (in above synthetic test case) disappear. Mentioned patch 
>> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit.
>> 
>> Would you like to give it a try and see if it helps your issue? We 
>> shouldn’t need big queues under mild loads anyway …
>> 
>> Regards,
>> Klement
>> 
>>> On 13 Nov 2020, at 16:03, Elias Rudberg <elias.rudb...@bahnhof.net> wrote:
>>> 
>>> Hello VPP experts,
>>> 
>>> We are using VPP for NAT44 and we get some "congestion drops", in a 
>>> situation where we think VPP is far from overloaded in general. Then 
>>> we started to investigate if it would help to use a larger handoff 
>>> frame queue size. In theory at least, allowing a longer queue could 
>>> help avoiding drops in case of short spikes of traffic, or if it 
>>> happens that some worker thread is temporarily busy for whatever 
>>> reason.
>>> 
>>> The NAT worker handoff frame queue size is hard-coded in the 
>>> NAT_FQ_NELTS macro in src/plugins/nat/nat.h where the current value 
>>> is 64. The idea is that putting a larger value there could help.
>>> 
>>> We have run some tests where we changed the NAT_FQ_NELTS value from
>>> 64 to a range of other values, each time rebuilding VPP and running 
>>> an identical test, a test case that is to some extent trying to 
>>> mimic our real traffic, although of course it is simplified. The 
>>> test runs many
>>> iperf3 tests simultaneously using TCP, combined with some UDP 
>>> traffic chosen to trigger VPP to create more new sessions (to make 
>>> the NAT "slowpath" happen more).
>>> 
>>> The following NAT_FQ_NELTS values were tested:
>>> 16
>>> 32
>>> 64  <-- current value
>>> 128
>>> 256
>>> 512
>>> 1024
>>> 2048  <-- best performance in our tests
>>> 4096
>>> 8192
>>> 16384
>>> 32768
>>> 65536
>>> 131072
>>> 
>>> In those tests, performance was very bad for the smallest 
>>> NAT_FQ_NELTS values of 16 and 32, while values larger than 64 gave 
>>> improved performance. The best results in terms of throughput were 
>>> seen for NAT_FQ_NELTS=2048. For even larger values than that, we got 
>>> reduced performance compared to the 2048 case.
>>> 
>>> The tests were done for VPP 20.05 running on a Ubuntu 18.04 server 
>>> with a 12-core Intel Xeon CPU and two Mellanox mlx5 network cards.
>>> The number of NAT threads was 8 in some of the tests and 4 in some 
>>> of the tests.
>>> 
>>> According to these tests, the effect of changing NAT_FQ_NELTS can be 
>>> quite large. For example, for one test case chosen such that 
>>> congestion drops were a significant problem, the throughput 
>>> increased from about 43 to 90 Gbit/second with the amount of 
>>> congestion drops per second reduced to about one third. In another 
>>> kind of test, throughput increased by about 20% with congestion 
>>> drops reduced to zero. Of course such results depend a lot on how 
>>> the tests are constructed. But anyway, it seems clear that the 
>>> choice of NAT_FQ_NELTS value can be important and that increasing it 
>>> would be good, at least for the kind of usage we have tested now.
>>> 
>>> Based on the above, we are considering changing NAT_FQ_NELTS from 64 
>>> to a larger value and start trying that in our production 
>>> environment (so far we have only tried it in a test environment).
>>> 
>>> Were there specific reasons for setting NAT_FQ_NELTS to 64?
>>> 
>>> Are there some potential drawbacks or dangers of changing it to a 
>>> larger value?
>>> 
>>> Would you consider changing to a larger value in the official VPP 
>>> code?
>>> 
>>> Best regards,
>>> Elias
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18022): https://lists.fd.io/g/vpp-dev/message/18022
Mute This Topic: https://lists.fd.io/mt/78234850/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

  • ... Elias Rudberg
    • ... Klement Sekera via lists.fd.io
      • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via lists.fd.io
      • ... Marcos - Mgiga
        • ... ksekera via []
          • ... Marcos - Mgiga
            • ... Klement Sekera via lists.fd.io
              • ... Marcos - Mgiga
                • ... Klement Sekera via lists.fd.io
            • ... Christian Hopps
              • ... Honnappa Nagarahalli
              • ... Klement Sekera via lists.fd.io
                • ... Christian Hopps
      • ... Elias Rudberg
        • ... Klement Sekera via lists.fd.io
          • ... Elias Rudberg
            • ... Elias Rudberg
            • ... Elias Rudberg

Reply via email to