On Tue, Nov 24, 2015 at 1:25 PM, Michael Hamburg <m...@shiftleft.org> wrote:

> I agree for new protocols, but the proposal for TLS isn’t all-DH.  It’s
> allowing both all-DH and DHE+sign.  That’s more complex than just allowing
> DHE+sign.  But I suppose the difference in TLS as proposed is really just
> putting a DH+MAC in CertificateVerify instead of a signature, which isn’t a
> complicated difference.
>
> Sorry to be negative.  I really do like all-DH for simplicity, compactness
> and speed, especially if IP-encumbered algorithms are available.  I’m not
> against its inclusion in TLS if others think it’s worth the complexity of
> adding another option.  But I’m grumpy because this thread started with an
> insecure proposal justified using incorrect numbers.
>

I need to remember not to skip details when writing to this group :)  Yes,
encrypting a handshake digest is the right thing to do, not the client
random.  I normally leave out this detail in discussions to simplify making
my point.  I'll see if I can correct some numbers below.  Also, that was an
awesome attack in the paper linked to above.  Clearly, this is tricky
territory to make changes securely!

I think latency is more important than CPU overhead or whether the latency
happens at the client or server.  Since servers have well managed
computation capability, it may be best to favor server-side latency, where
it can be managed and minimized, vs client latency which can vary much more.

Just looking at TLS 1.3 1-RTT mode handshake (0-RTT does not use
CertificateVerify), the latency looks like this:

- First flight from client to server: 0 overhead, since ephemeral keyshare
can be precomputed
- First flight from server to client: 1 shared key derivation, in parallel
with signing
- Second flight from client to server: 1 shared key derivation in parallel
with verification

At least for Ed25519 and ECDSA-P256, signing is faster than shared key
derivation, which is faster than signature verification.  So, the latency
is 1 shared key derivation + 1 signature verification.  The median values
for Skylake using curve25519/Ed25519 are, in cycles:

keyshare computation: 150690
shared secret derivation: 143338
signing: 48990
verify signature: 161488

If we take full advantage of parallelism, the latency is 304,862 cycles.
If instead we used ECC DH everywhere, it would be 2 keyshare computations =
286676 cycles.  There would only be about a 6% improvement, which is hard
to get excited about.  Also, on some other curves it could be slower,
rather than faster.

AFAIK, OpenSSL and most other TLS libraries do not use multiple threads,
and will not take advantage of the available parallelism to reduce
latency.  At first glance, it looks like there is significant opportunity
here, but I know the coders involved in this code base are among the best
in the world.  There is zero chance they decided to avoid pthreads without
some pretty good reasons (cross-platform compatibility?).  I bet the
ephemeral keyshares are also not precomputed.

Without multiple threads, the 1-RTT handshake takes 2 keyshare computations
+ 2 shared key derivations + 1 sign + 1 verify = 798534 cycles.  With ECC
DH based proof-of-possession, it would be 2 keyshare computations + 4
shared key derivations = 874732 cycles, worse than with the current signing
scheme.

>From the paper, it sounds like using delegated keys currently has some
unanticipated security problems, at least in the near term while we
continue to accept incorrectly padded RSA based certs.  Would Hugo's
suggestions for extending certificates address weaknesses due to delegated
keys, and allow DH keyshares to be used for proof-of-possession, and
possibly MQV?  If so, it sounds like a valuable upgrade.

Thanks,
Bill
_______________________________________________
TLS mailing list
TLS@ietf.org
https://www.ietf.org/mailman/listinfo/tls

Reply via email to