[TLS] Resumption and Forward Secrecy, 0-RTT and Safety

Colm MacCárthaigh Mon, 28 Mar 2016 11:56:50 -0700

Over the past week or so I've sent a few messages here on 0RTT but I've
been muddling together some separate concerns, and I'd like to split them
apart and treat them separately, with some concrete suggestions.


*Resumption and Forward Secrecy*
One of the reasons I'm excited about TLS1.3 is the prospect of greater
deployment of forward secrecy. Compromise of a credential - like an RSA or
ECDSA key - shouldn't crack open a trove of collected data. But a pitfall
here has been session tickets and session caches.

If an attacker compromises a session ticket encryption key, they can
decrypt any sessions encrypted which used that key. If an attacker
compromises a session cache, they can decrypt any sessions contained in the
cache.  In the real world: the former is much worse than the latter. A
cache is bounded in size and capacity and has real cost associated with
storing entries; it's also more likely to be sharded with "local" caches
relying on routing affinity.  On the other hand it is cheap and convenient
to use a single session ticket encryption key. Widely deployed software
(e.g. Apache, nginx ... ) has poor support for key rotation - for example
no schedule for rotating keys, no tracking how many times each key is used,
no support for multiple keys overlapping in time. It is not surprising to
see configurations where the same ticket encryption key has been in use for
years; outlasting RSA key lifetimes.

Put bluntly; session tickets as deployed in the real world defeat the point
of forward secrecy. A compromise of a single credential loads to a
catastrophic loss in security for previously collected sessions. Worse of
all: users have no way to audit how these keys are being managed. At least
it's possible to observe how long an RSA/ECDSA key is in use.

An alternative way to go is to restructure session resumption as single-use
session resumption IDs. Here's how it would work:

* TLS client asks server for 1 ... N session resumption IDs.  Both the
client and server iterate their PRF/KDFs N times ... deriving the same keys
(without exchanging them). The server then nominates N small (e.g. 8 byte)
IDs for each set of resumption state.

* Each ID is valid for use on a future connection. TLS clients are advised
that they MUST use each ID just once; discarding and erasing it upon use
during the resumption handshake.

Now this balloons the already-costly cost of session caches; by a factor of
N. And strangely, that's the point - it structures things so that the cache
implementor is incentivized to evict and replace entries. With eviction in
place; the regular security model of TLS is also restored. A compromise of
the cache only puts future connections at risk, and future connections are
generally always at risk due to server compromise.

I've built and operated a large CDN, and we use tickets (though we rotate
the keys!), and it would be an increase in costs to implement a session
cache, though I'd guess it's manageable. But that does suck. My argument
here is squarely user-security-centric;  TLS tickets are a dangerous sharp
edge that implementors keep screwing up. I think they should be blunted,
even at the expense of increasing costs for people like me.

One might reasonably say that this cost is too much, that forward secrecy
isn't worth losing tickets (which are cheap) over. But then, if we are
willing to sacrifice FS, why not go back to just encrypting using the
server key? that's a lot simpler. And that's how 0RTT is defined,  which
brings me to ...

*0RTT and Safety*
0-RTT had the potential to lower the latency for a lot of web users; which
is an awesome goal. Though I'd like to point out that the benefits don't
look that great to me compared to keeping connections alive for very long
periods of time:  Despite the name 0RTT still requires 1RTT (the TCP SYN ->
SYN|ACK exchange) before any data can be sent. A long-lived connection
doesn't have this problem : and in response to web sockets, IOT, and other
shifts, the technology to keep millions of connections open for long
periods of times on the server side (and even move live connections between
machines) is improving, along with long-lived connection battery-conserving
improvements for mobile. A connection that you keep open is "really" 0RTT;
the socket is primed for immediate I/O.

I see at least three different challenges with 0RTT as defined. The first
is a general and high level one: we seem to willing to accept a "lower"
level of security for 0RTT data (e.g. no FS, even if the rest of the
session has it). Why? What is it we think is special about this data that
it is "less" worth protecting? surely there are very sensitive things in
urls, surely there are potential oracles and other things in there too? It
just seems super strange to me.

The second challenge is that the replayability of the 0RTT poses a
cryptographic safety challenge. Take Lucky13 - which is a brilliant attack
and is stunningly effective against DTLS because it is so easy to replay
over and over; barely needing to change any parameters - and let the server
do the work. 0RTT looks very similar. It doesn't seem wise to let cipher
text manipulators take as many cracks at the whip as they'd like.

The third challenge is that the 0RTT plaintext data itself may not be safe
to replay; that is that it might trigger some kind of non-idempotent
action. Idempotence is really really hard, it isn't safe to simply plug in
a replayable section to existing protocols. There's also a huge difference
between being tolerant to a small number of replays, and a large unbounded
number. For example: a large unbounded number may be used to generate DOS
attacks against  throttles and quotas.

*Tying things together*
Short of some kind of transactional locking protocol during TLS handshakes,
I don't think there is a scheme that can perfectly prevent replay. Bill
Cox' analysis is a really good one here. But I'd like to observe that the
sort of single-use-session-id cache outlined above has a nice property that
it makes for a sort of strike register. Since the server-side implementor
is incentivized to evict entries, or at least mark them as used, so that
the slot is available for re-use; that can be doubled-up as a "we've seen
this already" signal. This reduces the replay window to the time period for
that signal to propagate (e.g. for an eviction to happen from the cache).

So 0RTT data could be encrypted under the resumption session id. That
creates the challenge that the session might not be there any more, so the
server may not be able to decrypt the 0RTT data. I actually think this is a
plus, and lines up with a separate important change I think is necessary -
the 0RTT data shouldn't be application data. It should be a separate,
optional, stream. I find it helpful to think of it as a hint, so it could
be called "replayable_hint". Instead of breaking apart an existing protocol
and putting some of it in the early data and some in the application data
transparently (a disaster in waiting), the client and server would have to
formally agree on the kind of data that could be in a "replayable_hint".
This goes a long way to mitigating many protocol level idempotency
concerns, and has no impact on the kind of pre-fetching people want to do
for HTTP and other protocols. At a bare minimum, I think we should make
this change.

Lastly, and this is a little crazy but I haven't let that stop me before
... to guard against the smaller replay window and idempotency problems at
the application levels,clients should occasionally send duplicate and
unrelated hints, just opportunistically. This keeps the server side
application "on notice" that that kind of craziness can occur, and better
to have it happen a little all of the time in a controlled way, than rarely
by attackers.

*Summary*
A common theme in the above is that it makes things more expensive for
server-side implementors, and that sucks - but I don't see another way to
avoid some of the pitfalls here; and I'm unhappy with the state of tickets
today. If I'm on my own on that, I'd be interested in what kinds of data
people might kind convincing. My own impressions come from being an Apache
httpd developer and assisting people with configurations and running
workshops at conferences. It's not scientific, but the prevalence of
non-rotation is so severe in my sample set that I'm convinced it's the
norm.

-- 
Colm

_______________________________________________
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls

[TLS] Resumption and Forward Secrecy, 0-RTT and Safety

Reply via email to