[TLS] Re: 【Reply to the comments after the presentation in Montreal】RE: Re: FW: New Version Notification for draft-wang-tls-service-affinity-00.txt

Christian Huitema Sun, 04 Jan 2026 11:28:07 -0800

This discussion of "session continuity" reminds me a lot of similardiscussions in the 7-layer OSI model. It included an hypothetical"session" layer, which included the functions required to safely recoverfrom a breakout in communication. That "session" layer disappeared inmodern architectures, for a variety of reasons. As we seem to revisitthe issue, let's get a reminder of what we learned in those years.

The main lesson is that recovery of interruptions requires a form ofcheck-pointing, and that such check-pointing often involves severalconnections, not just one. The classic example of such check-pointing isthe "two phase commit" algorithms, whose primary goal is to ensure thattransactions either succeed or fail across all systems involved. In thebanking example the "make payment" transaction typically involvecustomer, vendor and their banks, and the goal is to decide that thepayment either takes place everywhere or not at all.

The next lesson is that such check-pointing mostly requires visibilityby the application. Ultimately, the application is in charge of ensuringthat databases remain coherent. It may chose different algorithmsdepending on its requirements. Two-phase commit is just an example,other systems based on journals are also popular. Some applications maybe satisfied with "eventually converging" instead of "always in sync".

Yet another lesson is that this synchronization is definitely not a"transport" issue. That's why previous attempt to create a "session"layer between "transport" and "application" have failed. The applicationdecides whether to recover or roll-back a transaction, for examplecanceling a transaction if it took too long. The transport cannot reallydo that, and a simple TLS layer on top of the transport cannot do thateither. One could design transport extensions that deal with very simplecases, such as deciding how to restart a file transfer, but even thosehave limited value, and are easily superseded by application levelsolutions like "get the bytes of file X starting at offset Y". In reallife, such "simple" solutions will have qualifiers, such as "get thebytes of file X starting at offset Y but only if the file did not changebefore time T".

All that to say that I am not convinced at all by a proposition toinsert a "session data continuity" mechanism at the TLS layer. Suchfunctionality is application dependent, it belongs to the application,not to the transport. The current design of the session resume mechanismcorrectly focuses on just one piece of the puzzle: spend less CPUrestarting a TLS session if the peer remember the keying material of aprevious session. That's an optional mechanism, and it narrowly focuseson TLS specific data. Adding complexity there would be counter-productive.


-- Christian Huitema


On 1/4/2026 6:38 AM, Eric Rescorla wrote:

On Sun, Jan 4, 2026 at 1:27 AM Aijun Wang <[email protected]>wrote:


                           Server

           <------------------ First TCP/TLS Connection ---------------->
           POST /make-payment (1/2) ---\ /---------------- Switch servers
                           X
            <---------------------------/ \------------------------------>
            [Buffer /make-payment (2/2)]
    <--------------------------------------------------------  ACK

            <-------------------- New TCP/TLS Connection ---------------->

    [If in application layer, the client side doesn’t receive the
    response from make-payment, it needs to send again make-payment(1)
    and also the make-payment(2)]

    /make-payment (1/2)-----------------------------------------à

You had said earlier that the TLS stack should also be buffering andre-sending `make-payment(1/2)`. Is that still your view?


-Ekr


    ==========================================================

    *From:*[email protected]
    [mailto:[email protected]] *On Behalf Of *Eric Rescorla
    *Sent:* Sunday, January 4, 2026 10:12 AM
    *To:* Aijun Wang <[email protected]>
    *Cc:* [email protected]
    *Subject:* [TLS] Re: 【Reply to the comments after the presentation
    in Montreal】RE: Re: FW: New Version Notification for
    draft-wang-tls-service-affinity-00.txt

    On Sat, Jan 3, 2026 at 5:42 PM Aijun Wang
    <[email protected]> wrote:

        Hi, Eric:

        What we want to is similar with “Resumption and Pre-Shared
        Key(PSK)”that is described in
        https://datatracker.ietf.org/doc/html/rfc8446#section-2.2

        From this section, we can know the application layer will not
        aware such session resumption, TLS layer handles all the
        procedure. Right?

    Not necessarily. The TLS specification takes no position on when
    (1) clients should attempt resumption and (2) servers should allow it.

        What you described in previous examples can all happen in the
        resumption process, and the application layer should have
        their own additional confirmation/retry logic.

    I'm not sure that's in fact true. The purpose of the examples was
    to explore that, which is why I asked you to provide your own
    ladder diagrams showing how you thought this worked. Again, can
    you please do that?

        For the mentioned
        
draft(https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/),
        the additional exchange signals are to transfer the new server
        address securely after the initial connection.

        What’s the client and server need do is to correlate the
        corresponding cryptographic context to the new underlying TCP
        connection.

        Do you have any suggestions to make the above intension more
        clearly in
        https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/?

    As I said, I think this is the wrong design, so my suggestion is
    you don't do it.

    To the extent to which you are trying to make the case otherwise,
    you really need to show your work, which this message does not do.

    -Ekr

        *From:*[email protected]
        [mailto:[email protected]] *On Behalf Of
        *【外部账号】Eric Rescorla
        *Sent:* Tuesday, December 30, 2025 10:41 PM
        *To:* Aijun Wang <[email protected]>
        *Cc:* [email protected]; [email protected];
        Mohit Sahni <[email protected]>; Aijun Wang
        <[email protected]>
        *Subject:* Re: [TLS] Re: 【Reply to the comments after the
        presentation in Montreal】RE: Re: FW: New Version Notification
        for draft-wang-tls-service-affinity-00.txt

        On Tue, Dec 30, 2025 at 2:10 AM Aijun Wang
        <[email protected]> wrote:

            Hi, Eric:

            Contrary to your conclusions, I think the application
            layer and TLS/TCP layer should(already) have their own
            mechanisms to assure the data integrity,

        Yes, which might or might not work correctly, because they are
        rarely tested.

            there is no necessary to consider them again at the
            protocol layer, we need just some guidance for the
            implementation of client/server sides themselves.

            If there is data arrival during the switchover, the
            internal implementation logic is the application layer
            will call the api of TLS/TCP to send some data, with the
            same session identifier.

        I don't know what you mean by "The same session identifier".
        There is no concept in TLS that two different TCP connections
        are somehow the same conceptual flow of data. PSK identifiers
        solely identify keys.


            In this case, the client doesn't know what has happened.
            You need
            mechanisms either at the HTTP layer--or more typically at
            the REST API
            layer--to do the right thing, which might be an
            idempotency layer
            combined with client-side retransmit.  This is all just a
            straightforward application of the end-to-end argument,
            and there's no
            real way around it as long as systems might asynchronously
            fail, but
            it's also a source of defects (think about how many times
            sites tell
            you not to press the submit button twice) because these
            mechanisms may
            not have been exercised or tested. For instance, if the
            server is high
            reliability and the client just assumes that anything it
            sent works,
            that will be good enough a very large fraction of the
            time, but not if
            the server has a high failure rate.

            [WAJ] From the example, we can know each application has
            its own confirmation mechanism, because most of them are
            asynchronous.

            The application knows there will be possibilities that the
            server crash, or the underlay connection broken.

        Yes. I said exactly this, but again, they're not always going
        to be

        implemented correctly, and that's largely OK because most

        connections don't fail. You're talking about making an exceptional

        condition routine.

            Unfortunately, these transaction semantics only exist at
            the HTTP
            layer, not the TLS layer, so the TLS layer has no way of
            knowing to
            wait for the 200 OK, it just knows that the client sent
            some data, but
            not whether that reflects an outstanding request or
            something else;
            recall that TLS doesn't even know about the HTTP
            request/response
            semantics, because it's just a dumb pipe.

            [WAJ] TLS needn’t aware the 200 OK signal, it is the job
            of application layer.

            TLS/TCP needs only transmit the data from the application
            layer correctly to other side.

        So you're saying that in the example above, the TLS layer
        ought to inform

        the HTTP layer that the connection has failed and trust the
        HTTP layer

        to retry in a safe fashion?

            In your email, you suggest that the client ought to:

            1. Wait for the server's TCP ACK of all transmitted data,
            with the
            implied semantics being that once the message is ACKed it
            will be
            reliably delivered to the server, not just to the TCP stack.

            [WAJ] No. I emphasize only the TCPACK and the TCP stack.
            Not the application stack. That is to say, receiving the
            TCP ACK doesn’t represent the application layer ACK.


            2. Buffer any data it receives form the cleint while
            waiting for the
            ACK and retransmit it on the new connections.

            [WAJ] Buffer any data it receives, but can’t transmit
            immediately during the switchover process, not waiting for
            the application ACK.

        I don't understand what you're saying here. Can you please
        provide:

        1. A concrete description of what you believe the rules that the

        TLS stack should be following.

        2. New versions of my ladder diagrams that show what you
        believe the correct

        behavior is.

        -Ekr


_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]


_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[TLS] Re: 【Reply to the comments after the presentation in Montreal】RE: Re: FW: New Version Notification for draft-wang-tls-service-affinity-00.txt

Reply via email to