Re: [Taps] MTU / equivalent at the transport layer

Joe Touch Mon, 12 Dec 2016 10:50:47 -0800


On 12/12/2016 1:31 AM, Michael Welzl wrote:
> Hi,
>
> Just trying to understand, so we're not talking past each other. Please note 
> that I'm not trying to argue in any direction with my comments below, just 
> asking for clarification:
Sure...
>
>> On 09 Dec 2016, at 18:32, Joe Touch <to...@isi.edu> wrote:
>>
>>
>>
>> On 12/9/2016 8:12 AM, Michael Welzl wrote:
>>>> On 09 Dec 2016, at 16:18, Joe Touch <to...@isi.edu> wrote:
>>>>
>>>>
>>>>
>>>> On 12/9/2016 12:09 AM, Michael Welzl wrote:
>>>>>> On 07 Dec 2016, at 20:29, Joe Touch <to...@isi.edu> wrote:
>>>>>>
>>>>>> FYI, there are two different "largest messages" at the transport layer:
>>>>>>
>>>>>> 1) the size of the message that CAN be delivered at all
>>>>> True... I wasn't thinking of that, but yes.
>>>>>
>>>>>
>>>>>> 2) the size of the message that can be delivered without network-layer
>>>>>> fragmentation (there are no guarantees about link-layer - see ATM or the
>>>>>> recent discussion on tunnel MTUs on INTAREA)
>>>>>>
>>>>>> MTU generally refers to the *link payload*. At that point, transports
>>>>>> have to account for network headers, network options, transport headers,
>>>>>> and transport options too. See RFC6691.
>>>>>>
>>>>>> MSS refers to the transport message size AFAICT. It is *sometimes* tuned
>>>>>> to MTU-headers but not always.
>>>>>>
>>>>>> E.g., for IPv6, link MTU is required to be at least 1280, but the
>>>>>> src-dst transit MTU is required to be at least 1500. So a transport that
>>>>>> wants to match sizes and reduce fragmentation issues would pick
>>>>>> 1280-IPh-IPo-TCPh-TCPo, but a transport is supposed to be able to trust
>>>>>> that 1500-IPh-IPo-TCPh-TCPo can still get through at least some of the 
>>>>>> time.
>>>>> So I'm getting the impression that the answer to my question really is 
>>>>> that, to figure out 2)  (which I was concerned with), an application 
>>>>> programmer needs to do the calculation her/himself.
>>>> To figure out 2), the transport layer needs to know the unfragmented
>>>> link MTU, the size of all of the network headers (including options),
>>>> and the sizes of its own headers and options.
>>>>
>>>> It's also sometimes assumes that the transport can control the "DF" bit
>>>> (for IPv4).
>>> Yes - but that hardly sounds worse to me than requiring the application 
>>> programmer to do this protocol-specific calculation by hand...
>> The app programmer needs to know what the transport can support, the
>> transport needs to know what net supports, etc.
>>
>> Pushing the link MTU up the line and expecting all the other layers to
>> figure out what to do results in unnecessary complexity, never mind
>> undermining one of the key features of layering.
> Either we just agree here, or you're saying that your 2) above should not be 
> exposed? Or something else?


I'm saying that exposing 2) is a bad idea because it requires extra
information that can vary at other layers.


>>>> However, this all breaks down if the app makes the wrong choice because
>>>> the net can (will, and should) source fragment if it gets a message that
>>>> turns out  to be too big for one fragment anyway.
>>>>
>>>>> Not a big deal - and maybe some systems offer a function to give you the 
>>>>> size of a message that won't be fragmented.
>>>> Remember that - at best - you're optimizing for the next layer down
>>>> only. You can't know whether that net layer message is link fragmented
>>>> (e.g., as in ATM) or tunnel fragmented (as needs to be required or this
>>>> whole MTU concept breaks down).
>>> Sure - but that's something end systems just can't see. It's information up 
>>> to and including the IP layer that should be correctly handed over up the 
>>> stack, inside the host, with all the caveats this information comes with.
>> Why does that apply at the link layer but not other layers? If transport
>> can transfer and reassemble 1MB messages, then that's the "MTU" it needs
>> to tell the app layer. The same is true for net to tell transport, etc.
>>
>> We've conflated the two between transport and net unnecessarily.
> So this sounds like you're saying that your item 2) above should not be 
> exposed by the transport layer to the application.
Right - because it's irrelevant to the app. The app needs to know the
"unit of transfer" of the next layer down. If transport frags and
reassembles it to the network layer, then the network layer unit of
transmission is not relevant to the app.


>>>>> However: this calculation is transport protocol dependent, which we 
>>>>> really don't want to have in TAPS.
>>>> If you want to fix this, you need to change the API to the net layer to
>>>> provide immediate feedback. When transport hands a segment to network,
>>>> it has to get a "call failed" if the message is too big - and we really
>>>> do need transport layers to be able to pick between "too big for
>>>> non-fragmented net layer" and "too big for the net layer even with frag".
>>>>
>>>> Merely handing info to the transport layer might not be enough, esp.
>>>> when net layer option lengths change.
>>> True if you want to cleanly fix it across the RFC-specified stack, but 
>>> that's beyond the concern of TAPS - it becomes a requirement from the TAPS 
>>> WG. Does that make sense?
>> Then this is part of the API requirements that TAPS should be
>> indicating, no?
> So what does that mean: that the API should contain a "don't fragment" flag 
> from the application?

The API between transport and app should expose the "unit of transfer"
to the app.

The API between transport and network layers needs to expose the DF bit
*only* for IPv4 and only if it wants to avoid in-transit
re-fragmentation, but that's a temporary issue as it applies only to
IPv4. The real question is "what MTU does the network layer tell the
transport layer". There are two different ones (ignoring DF for now
because it's only IPv4):
  (a)  - the smallest message that can transit the network layer without
source fragmentation
  (b)  - the smallest message that can transit the network with source
fragmentation

Right now, neither of these are part of the transport-network API; at
best, we indicate the *link* MTU, from which transport needs to subtract
IPheader and IPoptions to determine the transport MTU (which it then
reduces by TCPhdr and TCPoptions, e.g. - but that's fine because it's
inside the transport layer).

Joe

_______________________________________________
Taps mailing list
Taps@ietf.org
https://www.ietf.org/mailman/listinfo/taps

Re: [Taps] MTU / equivalent at the transport layer

Reply via email to