This discussion of "session continuity" reminds me a lot of similar discussions in the 7-layer OSI model. It included an hypothetical "session" layer, which included the functions required to safely recover from a breakout in communication. That "session" layer disappeared in modern architectures, for a variety of reasons. As we seem to revisit the issue, let's get a reminder of what we learned in those years.

The main lesson is that recovery of interruptions requires a form of check-pointing, and that such check-pointing often involves several connections, not just one. The classic example of such check-pointing is the "two phase commit" algorithms, whose primary goal is to ensure that transactions either succeed or fail across all systems involved. In the banking example the "make payment" transaction typically involve customer, vendor and their banks, and the goal is to decide that the payment either takes place everywhere or not at all.

The next lesson is that such check-pointing mostly requires visibility by the application. Ultimately, the application is in charge of ensuring that databases remain coherent. It may chose different algorithms depending on its requirements. Two-phase commit is just an example, other systems based on journals are also popular. Some applications may be satisfied with "eventually converging" instead of "always in sync".

Yet another lesson is that this synchronization is definitely not a "transport" issue. That's why previous attempt to create a "session" layer between "transport" and "application" have failed. The application decides whether to recover or roll-back a transaction, for example canceling a transaction if it took too long. The transport cannot really do that, and a simple TLS layer on top of the transport cannot do that either. One could design transport extensions that deal with very simple cases, such as deciding how to restart a file transfer, but even those have limited value, and are easily superseded by application level solutions like "get the bytes of file X starting at offset Y". In real life, such "simple" solutions will have qualifiers, such as "get the bytes of file X starting at offset Y but only if the file did not change before time T".

All that to say that I am not convinced at all by a proposition to insert a "session data continuity" mechanism at the TLS layer. Such functionality is application dependent, it belongs to the application, not to the transport. The current design of the session resume mechanism correctly focuses on just one piece of the puzzle: spend less CPU restarting a TLS session if the peer remember the keying material of a previous session. That's an optional mechanism, and it narrowly focuses on TLS specific data. Adding complexity there would be counter-productive.

-- Christian Huitema


On 1/4/2026 6:38 AM, Eric Rescorla wrote:

On Sun, Jan 4, 2026 at 1:27 AM Aijun Wang <[email protected]> wrote:

                           Server

           <------------------ First TCP/TLS Connection ---------------->
           POST /make-payment (1/2) ---\ /---------------- Switch servers
                           X
            <---------------------------/ \------------------------------>
            [Buffer /make-payment (2/2)]
    <--------------------------------------------------------  ACK

            <-------------------- New TCP/TLS Connection ---------------->

    [If in application layer, the client side doesn’t receive the
    response from make-payment, it needs to send again make-payment(1)
    and also the make-payment(2)]

    /make-payment (1/2)-----------------------------------------à


You had said earlier that the TLS stack should also be buffering and re-sending `make-payment(1/2)`. Is that still your view?

-Ekr


    ==========================================================

    *From:*[email protected]
    [mailto:[email protected]] *On Behalf Of *Eric Rescorla
    *Sent:* Sunday, January 4, 2026 10:12 AM
    *To:* Aijun Wang <[email protected]>
    *Cc:* [email protected]
    *Subject:* [TLS] Re: 【Reply to the comments after the presentation
    in Montreal】RE: Re: FW: New Version Notification for
    draft-wang-tls-service-affinity-00.txt

    On Sat, Jan 3, 2026 at 5:42 PM Aijun Wang
    <[email protected]> wrote:

        Hi, Eric:

        What we want to is similar with “Resumption and Pre-Shared
        Key(PSK)”that is described in
        https://datatracker.ietf.org/doc/html/rfc8446#section-2.2

        From this section, we can know the application layer will not
        aware such session resumption, TLS layer handles all the
        procedure. Right?

    Not necessarily. The TLS specification takes no position on when
    (1) clients should attempt resumption and (2) servers should allow it.

        What you described in previous examples can all happen in the
        resumption process, and the application layer should have
        their own additional confirmation/retry logic.

    I'm not sure that's in fact true. The purpose of the examples was
    to explore that, which is why I asked you to provide your own
    ladder diagrams showing how you thought this worked. Again, can
    you please do that?

        For the mentioned
        
draft(https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/),
        the additional exchange signals are to transfer the new server
        address securely after the initial connection.

        What’s the client and server need do is to correlate the
        corresponding cryptographic context to the new underlying TCP
        connection.

        Do you have any suggestions to make the above intension more
        clearly in
        https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/?

    As I said, I think this is the wrong design, so my suggestion is
    you don't do it.

    To the extent to which you are trying to make the case otherwise,
    you really need to show your work, which this message does not do.

    -Ekr

        *From:*[email protected]
        [mailto:[email protected]] *On Behalf Of
        *【外部账号】Eric Rescorla
        *Sent:* Tuesday, December 30, 2025 10:41 PM
        *To:* Aijun Wang <[email protected]>
        *Cc:* [email protected]; [email protected];
        Mohit Sahni <[email protected]>; Aijun Wang
        <[email protected]>
        *Subject:* Re: [TLS] Re: 【Reply to the comments after the
        presentation in Montreal】RE: Re: FW: New Version Notification
        for draft-wang-tls-service-affinity-00.txt

        On Tue, Dec 30, 2025 at 2:10 AM Aijun Wang
        <[email protected]> wrote:

            Hi, Eric:

            Contrary to your conclusions, I think the application
            layer and TLS/TCP layer should(already) have their own
            mechanisms to assure the data integrity,

        Yes, which might or might not work correctly, because they are
        rarely tested.

            there is no necessary to consider them again at the
            protocol layer, we need just some guidance for the
            implementation of client/server sides themselves.

            If there is data arrival during the switchover, the
            internal implementation logic is the application layer
            will call the api of TLS/TCP to send some data, with the
            same session identifier.

        I don't know what you mean by "The same session identifier".
        There is no concept in TLS that two different TCP connections
        are somehow the same conceptual flow of data. PSK identifiers
        solely identify keys.


            In this case, the client doesn't know what has happened.
            You need
            mechanisms either at the HTTP layer--or more typically at
            the REST API
            layer--to do the right thing, which might be an
            idempotency layer
            combined with client-side retransmit.  This is all just a
            straightforward application of the end-to-end argument,
            and there's no
            real way around it as long as systems might asynchronously
            fail, but
            it's also a source of defects (think about how many times
            sites tell
            you not to press the submit button twice) because these
            mechanisms may
            not have been exercised or tested. For instance, if the
            server is high
            reliability and the client just assumes that anything it
            sent works,
            that will be good enough a very large fraction of the
            time, but not if
            the server has a high failure rate.

            [WAJ] From the example, we can know each application has
            its own confirmation mechanism, because most of them are
            asynchronous.

            The application knows there will be possibilities that the
            server crash, or the underlay connection broken.

        Yes. I said exactly this, but again, they're not always going
        to be

        implemented correctly, and that's largely OK because most

        connections don't fail. You're talking about making an exceptional

        condition routine.

            Unfortunately, these transaction semantics only exist at
            the HTTP
            layer, not the TLS layer, so the TLS layer has no way of
            knowing to
            wait for the 200 OK, it just knows that the client sent
            some data, but
            not whether that reflects an outstanding request or
            something else;
            recall that TLS doesn't even know about the HTTP
            request/response
            semantics, because it's just a dumb pipe.

            [WAJ] TLS needn’t aware the 200 OK signal, it is the job
            of application layer.

            TLS/TCP needs only transmit the data from the application
            layer correctly to other side.

        So you're saying that in the example above, the TLS layer
        ought to inform

        the HTTP layer that the connection has failed and trust the
        HTTP layer

        to retry in a safe fashion?

            In your email, you suggest that the client ought to:

            1. Wait for the server's TCP ACK of all transmitted data,
            with the
            implied semantics being that once the message is ACKed it
            will be
            reliably delivered to the server, not just to the TCP stack.

            [WAJ] No. I emphasize only the TCPACK and the TCP stack.
            Not the application stack. That is to say, receiving the
            TCP ACK doesn’t represent the application layer ACK.


            2. Buffer any data it receives form the cleint while
            waiting for the
            ACK and retransmit it on the new connections.

            [WAJ] Buffer any data it receives, but can’t transmit
            immediately during the switchover process, not waiting for
            the application ACK.

        I don't understand what you're saying here. Can you please
        provide:

        1. A concrete description of what you believe the rules that the

        TLS stack should be following.

        2. New versions of my ladder diagrams that show what you
        believe the correct

        behavior is.

        -Ekr


_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to