This discussion of "session continuity" reminds me a lot of similar
discussions in the 7-layer OSI model. It included an hypothetical
"session" layer, which included the functions required to safely recover
from a breakout in communication. That "session" layer disappeared in
modern architectures, for a variety of reasons. As we seem to revisit
the issue, let's get a reminder of what we learned in those years.
The main lesson is that recovery of interruptions requires a form of
check-pointing, and that such check-pointing often involves several
connections, not just one. The classic example of such check-pointing is
the "two phase commit" algorithms, whose primary goal is to ensure that
transactions either succeed or fail across all systems involved. In the
banking example the "make payment" transaction typically involve
customer, vendor and their banks, and the goal is to decide that the
payment either takes place everywhere or not at all.
The next lesson is that such check-pointing mostly requires visibility
by the application. Ultimately, the application is in charge of ensuring
that databases remain coherent. It may chose different algorithms
depending on its requirements. Two-phase commit is just an example,
other systems based on journals are also popular. Some applications may
be satisfied with "eventually converging" instead of "always in sync".
Yet another lesson is that this synchronization is definitely not a
"transport" issue. That's why previous attempt to create a "session"
layer between "transport" and "application" have failed. The application
decides whether to recover or roll-back a transaction, for example
canceling a transaction if it took too long. The transport cannot really
do that, and a simple TLS layer on top of the transport cannot do that
either. One could design transport extensions that deal with very simple
cases, such as deciding how to restart a file transfer, but even those
have limited value, and are easily superseded by application level
solutions like "get the bytes of file X starting at offset Y". In real
life, such "simple" solutions will have qualifiers, such as "get the
bytes of file X starting at offset Y but only if the file did not change
before time T".
All that to say that I am not convinced at all by a proposition to
insert a "session data continuity" mechanism at the TLS layer. Such
functionality is application dependent, it belongs to the application,
not to the transport. The current design of the session resume mechanism
correctly focuses on just one piece of the puzzle: spend less CPU
restarting a TLS session if the peer remember the keying material of a
previous session. That's an optional mechanism, and it narrowly focuses
on TLS specific data. Adding complexity there would be counter-productive.
-- Christian Huitema
On 1/4/2026 6:38 AM, Eric Rescorla wrote:
On Sun, Jan 4, 2026 at 1:27 AM Aijun Wang <[email protected]>
wrote:
Server
<------------------ First TCP/TLS Connection ---------------->
POST /make-payment (1/2) ---\ /---------------- Switch servers
X
<---------------------------/ \------------------------------>
[Buffer /make-payment (2/2)]
<-------------------------------------------------------- ACK
<-------------------- New TCP/TLS Connection ---------------->
[If in application layer, the client side doesn’t receive the
response from make-payment, it needs to send again make-payment(1)
and also the make-payment(2)]
/make-payment (1/2)-----------------------------------------à
You had said earlier that the TLS stack should also be buffering and
re-sending `make-payment(1/2)`. Is that still your view?
-Ekr
==========================================================
*From:*[email protected]
[mailto:[email protected]] *On Behalf Of *Eric Rescorla
*Sent:* Sunday, January 4, 2026 10:12 AM
*To:* Aijun Wang <[email protected]>
*Cc:* [email protected]
*Subject:* [TLS] Re: 【Reply to the comments after the presentation
in Montreal】RE: Re: FW: New Version Notification for
draft-wang-tls-service-affinity-00.txt
On Sat, Jan 3, 2026 at 5:42 PM Aijun Wang
<[email protected]> wrote:
Hi, Eric:
What we want to is similar with “Resumption and Pre-Shared
Key(PSK)”that is described in
https://datatracker.ietf.org/doc/html/rfc8446#section-2.2
From this section, we can know the application layer will not
aware such session resumption, TLS layer handles all the
procedure. Right?
Not necessarily. The TLS specification takes no position on when
(1) clients should attempt resumption and (2) servers should allow it.
What you described in previous examples can all happen in the
resumption process, and the application layer should have
their own additional confirmation/retry logic.
I'm not sure that's in fact true. The purpose of the examples was
to explore that, which is why I asked you to provide your own
ladder diagrams showing how you thought this worked. Again, can
you please do that?
For the mentioned
draft(https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/),
the additional exchange signals are to transfer the new server
address securely after the initial connection.
What’s the client and server need do is to correlate the
corresponding cryptographic context to the new underlying TCP
connection.
Do you have any suggestions to make the above intension more
clearly in
https://datatracker.ietf.org/doc/draft-wang-tls-service-affinity/?
As I said, I think this is the wrong design, so my suggestion is
you don't do it.
To the extent to which you are trying to make the case otherwise,
you really need to show your work, which this message does not do.
-Ekr
*From:*[email protected]
[mailto:[email protected]] *On Behalf Of
*【外部账号】Eric Rescorla
*Sent:* Tuesday, December 30, 2025 10:41 PM
*To:* Aijun Wang <[email protected]>
*Cc:* [email protected]; [email protected];
Mohit Sahni <[email protected]>; Aijun Wang
<[email protected]>
*Subject:* Re: [TLS] Re: 【Reply to the comments after the
presentation in Montreal】RE: Re: FW: New Version Notification
for draft-wang-tls-service-affinity-00.txt
On Tue, Dec 30, 2025 at 2:10 AM Aijun Wang
<[email protected]> wrote:
Hi, Eric:
Contrary to your conclusions, I think the application
layer and TLS/TCP layer should(already) have their own
mechanisms to assure the data integrity,
Yes, which might or might not work correctly, because they are
rarely tested.
there is no necessary to consider them again at the
protocol layer, we need just some guidance for the
implementation of client/server sides themselves.
If there is data arrival during the switchover, the
internal implementation logic is the application layer
will call the api of TLS/TCP to send some data, with the
same session identifier.
I don't know what you mean by "The same session identifier".
There is no concept in TLS that two different TCP connections
are somehow the same conceptual flow of data. PSK identifiers
solely identify keys.
In this case, the client doesn't know what has happened.
You need
mechanisms either at the HTTP layer--or more typically at
the REST API
layer--to do the right thing, which might be an
idempotency layer
combined with client-side retransmit. This is all just a
straightforward application of the end-to-end argument,
and there's no
real way around it as long as systems might asynchronously
fail, but
it's also a source of defects (think about how many times
sites tell
you not to press the submit button twice) because these
mechanisms may
not have been exercised or tested. For instance, if the
server is high
reliability and the client just assumes that anything it
sent works,
that will be good enough a very large fraction of the
time, but not if
the server has a high failure rate.
[WAJ] From the example, we can know each application has
its own confirmation mechanism, because most of them are
asynchronous.
The application knows there will be possibilities that the
server crash, or the underlay connection broken.
Yes. I said exactly this, but again, they're not always going
to be
implemented correctly, and that's largely OK because most
connections don't fail. You're talking about making an exceptional
condition routine.
Unfortunately, these transaction semantics only exist at
the HTTP
layer, not the TLS layer, so the TLS layer has no way of
knowing to
wait for the 200 OK, it just knows that the client sent
some data, but
not whether that reflects an outstanding request or
something else;
recall that TLS doesn't even know about the HTTP
request/response
semantics, because it's just a dumb pipe.
[WAJ] TLS needn’t aware the 200 OK signal, it is the job
of application layer.
TLS/TCP needs only transmit the data from the application
layer correctly to other side.
So you're saying that in the example above, the TLS layer
ought to inform
the HTTP layer that the connection has failed and trust the
HTTP layer
to retry in a safe fashion?
In your email, you suggest that the client ought to:
1. Wait for the server's TCP ACK of all transmitted data,
with the
implied semantics being that once the message is ACKed it
will be
reliably delivered to the server, not just to the TCP stack.
[WAJ] No. I emphasize only the TCPACK and the TCP stack.
Not the application stack. That is to say, receiving the
TCP ACK doesn’t represent the application layer ACK.
2. Buffer any data it receives form the cleint while
waiting for the
ACK and retransmit it on the new connections.
[WAJ] Buffer any data it receives, but can’t transmit
immediately during the switchover process, not waiting for
the application ACK.
I don't understand what you're saying here. Can you please
provide:
1. A concrete description of what you believe the rules that the
TLS stack should be following.
2. New versions of my ladder diagrams that show what you
believe the correct
behavior is.
-Ekr
_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
TLS mailing list -- [email protected]
To unsubscribe send an email to [email protected]