Re infra: I also pinged the provider to see what might be going on. I think 
they are still moving the stuff from one DC to another one so there can be some 
isolated glitches... anyway feel free to ping me or Dave if you see anything 
odd infra wise.

--a

> On 17 Mar 2021, at 16:08, Ivan Shvedunov <ivan...@gmail.com> wrote:
> 
> 
> Great, thanks!
> 
>> On Wed, Mar 17, 2021 at 6:03 PM Florin Coras <fcoras.li...@gmail.com> wrote:
>> Infra woes solved so finally merged!
>> 
>> Florin
>> 
>>> On Mar 17, 2021, at 6:13 AM, Ivan Shvedunov <ivan...@gmail.com> wrote:
>>> 
>>> I see CI keeps failing for my patch despite rebase... I have run the tests 
>>> that are failing locally with my patch applied (test_map, test_flowprobe, 
>>> test_memif) and they do pass there. So indeed must be something related to 
>>> the test environment
>>> 
>>> On Tue, Mar 16, 2021 at 7:02 PM Florin Coras <fcoras.li...@gmail.com> wrote:
>>>> Hi Ivan, 
>>>> 
>>>> 
>>>>> On Mar 16, 2021, at 8:51 AM, Ivan Shvedunov <ivan...@gmail.com> wrote:
>>>>> 
>>>>> Thanks for clearing things up :)
>>>> 
>>>> Thanks for bringing this up ;-)
>>>> 
>>>>> Concerning our use case: it's about PFCP protocol [1] that's used in the 
>>>>> 5G / LTE CUPS setting. High throughput is usually not important there as 
>>>>> that's a control protocol but sometimes it happens to produce rather 
>>>>> large UDP packets. We use PFCP in our VPP-based project [2], it's 
>>>>> implemented on top of the VPP session layer there.
>>>> 
>>>> Got it and cool!
>>>> 
>>>>> BTW we've been hitting the hard-coded IP reassembly limit (3 fragments) 
>>>>> with it too, mind if I post a patch that makes it configurable?
>>>> 
>>>> CC’ed Klement for his opinion on this. 
>>>> 
>>>> Cheers, 
>>>> Florin
>>>> 
>>>>> 
>>>>> [1] https://en.wikipedia.org/wiki/PFCP
>>>>> [2] https://github.com/travelping/upg-vpp
>>>>> 
>>>>> 
>>>>>> On Tue, Mar 16, 2021 at 6:19 PM Florin Coras <fcoras.li...@gmail.com> 
>>>>>> wrote:
>>>>>> Hi Ivan,
>>>>>> 
>>>>>> 
>>>>>>>> On Mar 16, 2021, at 6:40 AM, Ivan Shvedunov <ivan...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Florin,
>>>>>>>> 
>>>>>>>> (inline)
>>>>>>>> 
>>>>>>>> On Mon, Mar 15, 2021 at 10:27 PM Florin Coras <fcoras.li...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>>> On Mar 15, 2021, at 9:08 AM, Ivan Shvedunov <ivan...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>   Hello,
>>>>>>>>>> 
>>>>>>>>>>   I've noticed a problem in the VPP session layer and posted a patch 
>>>>>>>>>> to fix it [1].
>>>>>>>>>> Namely, UDP datagrams with length > 1908 that require buffer 
>>>>>>>>>> chaining get corrupted, because there's a bug in 
>>>>>>>>>> session_tx_fifo_chain_tail() that's responsible for the buffer 
>>>>>>>>>> chaining (it doesn't account for SESSION_CONN_HDR_LEN). The bug only 
>>>>>>>>>> affects datagram transports and doesn't break TCP.
>>>>>>>>> 
>>>>>>>>> Looks good. Thanks! More lower.
>>>>>>>> 
>>>>>>>> There seems to be an issue with CI, but I'm not sure if it's related 
>>>>>>>> to the patch in any way.
>>>>>>> 
>>>>>>> Most probably not. Rebased. 
>>>>>>> 
>>>>>>>
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>   There's another problem with the UDP code: "udp mtu". As far as I 
>>>>>>>>> understand, the plan there [2] is to use the MTU of the output 
>>>>>>>>> interface for each datagram being sent, and "udp { mtu ... }" setting 
>>>>>>>>> is only used as a temporary measure. Yet, it seems to me that what is 
>>>>>>>>> happening when a datagram exceeds that MTU value is not exactly 
>>>>>>>>> correct: instead of undergoing IP fragmentation as one would expect, 
>>>>>>>>> the datagrams get split into multiple UDP datagrams. This is not 
>>>>>>>>> handled correctly by apps that use UDP, most of the time, and did 
>>>>>>>>> cause me some hours spent debugging strange app behavior. Wouldn't 
>>>>>>>>> failing to send such datagrams be more correct?
>>>>>>>> 
>>>>>>>> The thing to be aware of here is that session layer offers as service 
>>>>>>>> the sending of datagrams written by apps to the network. If the 
>>>>>>>> datagrams are larger than the mtu (mss of sorts actually) configured 
>>>>>>>> for udp, session layer chops the datagrams to mtu size. The network 
>>>>>>>> layer can then fragment the resulting udp datagrams in accordance to 
>>>>>>>> the output interface’s mtu. 
>>>>>>>> 
>>>>>>>> It’s not recommended to use this feature because original datagram 
>>>>>>>> borders are lost. Moreover, losses will lead to even more problems. 
>>>>>>>> The option is there for historical reasons so we could consider 
>>>>>>>> removing it at one point, once we’re fully convinced it’s not useful. 
>>>>>>> 
>>>>>>> Problem is, the UDP fragmentation feature is on by default, with 
>>>>>>> default size limit being 1500 [1]. So, if one ever needs to send UDP 
>>>>>>> datagrams bigger than that they will be split, which may be rather 
>>>>>>> surprising if one is unaware of this UDP splitting feature. Maybe it's 
>>>>>>> worth changing the default value to some large number so that the 
>>>>>>> splitting does not happen unexpectedly?
>>>>>> 
>>>>>> The goal there is to avoid exercising the ip fragmentation code for the 
>>>>>> reasons you discovered lower, i.e., default max of 3 fragments. 
>>>>>> Splitting udp datagrams will always lead to some sort of problems 
>>>>>> because of loss so for now the value used is 1.5k just because it’s the 
>>>>>> common interface mtu. Those interested in using larger datagrams (8-9kB) 
>>>>>> will hopefully find this or older threads instead of going through the 
>>>>>> pain you had to go through!
>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>>   Another thing I've noticed is that if UDP MTU is high enough 
>>>>>>>>> sometimes the datagrams being sent still somehow are split into 
>>>>>>>>> smaller IP fragments than necessary. E.g. I have MTU 9000 on my 
>>>>>>>>> interfaces and UDP MTU also is 9000, and 8000-byte UDP datagram is 
>>>>>>>>> sent in one piece, while the IP packets carrying 11000-byte UDP 
>>>>>>>>> datagram are split into ~2kb IP fragments. Any ideas why this could 
>>>>>>>>> happen?
>>>>>>>> 
>>>>>>>> Hm, have you tried running an iperf3 test like the one part of make 
>>>>>>>> test [1] to see if the issue is still reproducible? Also, I’d avoid 
>>>>>>>> sending from the app dgrams larger than udp’s mtu. 
>>>>>>> 
>>>>>>> In this case the datagrams were smaller than the "udp mtu" value, but 
>>>>>>> larger than the interface's MTU.
>>>>>>> Looking at ip_frag code [2], it seems like it just can't create 
>>>>>>> fragments bigger than the buffer size (2048 bytes by default), that is, 
>>>>>>> it can't chain buffers for the fragments it creates, unless I'm somehow 
>>>>>>> mistaken. With IP reassembly limit e.g. in VPP itself being just 3 this 
>>>>>>> may be indeed an issue. E.g. one has MTU of 9000 and expects a packet 
>>>>>>> of length 10000 to be split in 2 and not 5 fragments; these 5 fragments 
>>>>>>> will not be reassembled if received by another VPP's session layer 
>>>>>>> further along the line. But perhaps one is expected to increase buffer 
>>>>>>> size when using higher interface MTU values?
>>>>>> 
>>>>>> I’ve never tried that but fair enough. Klement might be able to shed 
>>>>>> some light here. 
>>>>>> 
>>>>>> Out of curiosity, are you trying to send that large datagrams to improve 
>>>>>> efficiency/throughput? Going above 8kB, in a sustained fashion and 
>>>>>> without some form of pacing will probably lead to udp/session layer 
>>>>>> overwhelming the nic, unless you’re using 100Gbps nics. 
>>>>>> 
>>>>>> Regards, 
>>>>>> Florin
>>>>>> 
>>>>>>> 
>>>>>>> [1] https://github.com/FDio/vpp/blob/0ac5782/src/vnet/udp/udp.c#L522
>>>>>>> [2] 
>>>>>>> https://github.com/FDio/vpp/blob/0ac5782e600097b66e6b06e0b9edc79651f3a4bd/src/vnet/ip/ip_frag.c#L107-L109
>>>>>>> 
>>>>>>> -- 
>>>>>>> Ivan Shvedunov <ivan...@gmail.com>
>>>>>>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 
>>>>>>> 2807
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Ivan Shvedunov <ivan...@gmail.com>
>>>>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 
>>>>> 2807
>>>> 
>>> 
>>> 
>>> -- 
>>> Ivan Shvedunov <ivan...@gmail.com>
>>> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807
>> 
> 
> 
> -- 
> Ivan Shvedunov <ivan...@gmail.com>
> ;; My GPG fingerprint is: 2E61 0748 8E12 BB1A 5AB9  F7D0 613E C0F8 0BC5 2807
> 
> 
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18958): https://lists.fd.io/g/vpp-dev/message/18958
Mute This Topic: https://lists.fd.io/mt/81353121/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to