Re: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?

Klement Sekera via lists.fd.io Mon, 16 Nov 2020 02:49:17 -0800

Hi Elias,

thanks for getting back with some real numbers. I only tested with two workers 
and a very simple case and in my case, increasing queue size didn’t help one 
bit. But again, in my case there was 100% handoff rate (every single packet was 
going through handoff), which is most probably the reason why one solution 
seemed like holy grail and the other useless.


To answer your question regarding why queue length is 64 - I guess nobody knows 
as the author of that code has been gone for a while. I see no reason why this 
shouldn’t be configurable. When I tried just increasing the value I quickly run 
into out-of-buffers situation with default configs.

Would you like to submit a patch?

Thanks,
Klement

> On 16 Nov 2020, at 11:33, Elias Rudberg <elias.rudb...@bahnhof.net> wrote:
> 
> Hi Klement,
> 
> Thanks! I have now tested your patch (28980), it seems to work and it
> does give some improvement. However, according to my tests, increasing
> NAT_FQ_NELTS seems to have a bigger effect, it improves performance a
> lot. When using the original NAT_FQ_NELTS value of 64, your patch
> gives some improvement but I still get the best performance when
> increasing NAT_FQ_NELTS.
> 
> For example, one of the tests behaves like this:
> 
> Without patch, NAT_FQ_NELTS=64  --> 129 Gbit/s and ~600k cong. drops
> With patch, NAT_FQ_NELTS=64  --> 136 Gbit/s and ~400k cong. drops
> Without patch, NAT_FQ_NELTS=1024  --> 151 Gbit/s and 0 cong. drops
> With patch, NAT_FQ_NELTS=1024  --> 151 Gbit/s and 0 cong. drops
> 
> So it still looks like increasing NAT_FQ_NELTS would be good, which
> brings me back to the same questions as before:
> 
> Were there specific reasons for setting NAT_FQ_NELTS to 64?
> 
> Are there some potential drawbacks or dangers of changing it to a
> larger value?
> 
> I suppose everyone will agree that when there is a queue with a
> maximum length, the choice of that maximum length can be important. Is
> there some particular reason to believe that 64 would be enough? In
> our case we are using 8 NAT threads. Suppose thread 8 is held up
> briefly due to something taking a little longer than usual, meanwhile
> threads 1-7 each hand off 10 frames to thread 8, that situation would
> require a queue size of at least 70, unless I misunderstood how the
> handoff mechanism works. To me, allowing a longer queue seems like a
> good thing because it allows us to handle also more difficult cases
> when threads are not always equally fast, there can be spikes in
> traffic that affect some threads more than others, things like
> that. But maybe there are strong reasons for keeping the queue short,
> reasons I don't know about, that's why I'm asking.
> 
> Best regards,
> Elias
> 
> 
> On Fri, 2020-11-13 at 15:14 +0000, Klement Sekera -X (ksekera -
> PANTHEON TECH SRO at Cisco) wrote:
>> Hi Elias,
>> 
>> I’ve already debugged this and came to the conclusion that it’s the
>> infra which is the weak link. I was seeing congestion drops at mild
>> load, but not at full load. Issue is that with handoff, there is
>> uneven workload. For simplicity’s sake, just consider thread 1
>> handing off all the traffic to thread 2. What happens is that for
>> thread 1, the job is much easier, it just does some ip4 parsing and
>> then hands packet to thread 2, which actually does the heavy lifting
>> of hash inserts/lookups/translation etc. 64 element queue can hold 64
>> frames, one extreme is 64 1-packet frames, totalling 64 packets,
>> other extreme is 64 255-packet frames, totalling ~16k packets. What
>> happens is this: thread 1 is mostly idle and just picking a few
>> packets from NIC and every one of these small frames creates an entry
>> in the handoff queue. Now thread 2 picks one element from the handoff
>> queue and deals with it before picking another one. If the queue has
>> only 3-packet or 10-packet elements, then thread 2 can never really
>> get into what VPP excels in - bulk processing.
>> 
>> Q: Why doesn’t it pick as many packets as possible from the handoff
>> queue? 
>> A: It’s not implemented.
>> 
>> I already wrote a patch for it, which made all congestion drops which
>> I saw (in above synthetic test case) disappear. Mentioned patch 
>> https://gerrit.fd.io/r/c/vpp/+/28980 is sitting in gerrit.
>> 
>> Would you like to give it a try and see if it helps your issue? We
>> shouldn’t need big queues under mild loads anyway …
>> 
>> Regards,
>> Klement
>>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18040): https://lists.fd.io/g/vpp-dev/message/18040
Mute This Topic: https://lists.fd.io/mt/78230881/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [vpp-dev] Increasing NAT worker handoff frame queue size NAT_FQ_NELTS to avoid congestion drops?

Reply via email to