Dan Bernstein wrote: >We're talking about ML-KEM for TLS 1.3. Recall from RFC 8446 that TLS 1.3 mandates ECC. So, no, TLS implementors _can't_ remove ECC and just have "standalone ML-KEM".
This is incorrect. RFC 8446 only mandates ECC in the absence of an application profile standard specifying otherwise. Such application profiles are common, and TLS implementations often remove algorithms, especially in private network deployments. Constrained devices, for example, often support only one of secp256r1 or X25519. Likewise, CNSA 1.0 endpoints support only secp384r1, while CNSA 2.0 endpoints support only ML-KEM-1024. For TLS 1.2 (RFC 5246), the 3GPP TLS profile in TS 33.210 forbid support of SHA-1, non-PFS key exchange, and non-AEAD cipher suites. As a result, the "mandatory-to-implement" TLS_RSA_WITH_AES_128_CBC_SHA is prohibited to support for all three reasons. Contrary to what EKR wrote, I would argue that removing ECC through an application profile is fully compliant with RFC 8446. >The code-size difference simply doesn't exist: ECC is there anyway. This is incorrect. For constrained devices, code-size differences do exist, and based on past experience both the code size and the cycle count for X25519MLKEM768 versus ML-KEM-512 are likely to be significant in some systems. If (D)TLS is intended to remain viable on constrained devices, I believe the TLS WG should standardize standalone ML-KEM-512. >The cycle-count difference is swamped by the PQ communication costs. As Benjamin Kaduk has already noted, a quad-core 3 GHz Skylake with a 300 Mbps Internet connection is the opposite of constrained. While it is true that energy consumption would be dominated by communication, this does not make code size, CPU cycles, wattage, or memory usage irrelevant. Believing that CAPEX, OPEX, and real-time constraints for all constrained IoT systems can be reduced to a single number is somewhat naive. In truly constrained systems, every resource is a trade-off: additional code size, latency, or memory requirements are likely to prevent an algorithm from being used at all or cause it to be used less frequently. >Furthermore, job #1 for the post-quantum rollout is to _try_ to deal with the current security disaster of user data being exposed to future quantum attacks. I disagree — I would even go as far as saying that focusing solely on encryption is dangerous. I agree with CNSA 2.0 and the EU Roadmap that protecting long-lived devices is equally important. Compromising a device is typically a far higher-value target than decrypting a single decades-old connection: it gives an attacker access to more data, fresh data, enables active attacks, and eliminates the need to harvest and store traffic for decades. Migrating long-lived credentials in deployed devices affects a large part of the IETF, including LAMPS, SSHM, TLS, IPSEC, LAKE, and others. https://media.defense.gov/2025/May/30/2003728741/-1/-1/0/CSA_CNSA_2.0_ALGORITHMS.PDF https://ec.europa.eu/newsroom/dae/redirection/document/117507 Cheers, John Preuß Mattsson On 2025-11-26, 19:59, "D. J. Bernstein" <[email protected]> wrote: John Mattsson writes: > In these environments, standalone ML-KEM certainly reduces code size We're talking about ML-KEM for TLS 1.3. Recall from RFC 8446 that TLS 1.3 mandates ECC. So, no, TLS implementors _can't_ remove ECC and just have "standalone ML-KEM". Obviously the draft at hand doesn't change this; i.e., you're crediting the draft with a savings that the draft does not in fact achieve. Furthermore, job #1 for the post-quantum rollout is to _try_ to deal with the current security disaster of user data being exposed to future quantum attacks. Given that X25519MLKEM768 has by far the biggest head start on deployment, we want all TLS implementations supporting X25519MLKEM768, to maximize the chance of successfully establishing post-quantum connections---but of course that's contrary to any proposal to allow ECC to be removed from TLS. > You argued that "ECC+PQ has roughly the same performance properties as > non-hybrid PQ," and I pointed out that this is incorrect when you > consider cycle counts and code size. The code-size difference simply doesn't exist: ECC is there anyway. The cycle-count difference is swamped by the PQ communication costs. You aren't arguing against the statement I made. You're arguing against a strawman modification that replaces an analysis of system cost with microbenchmarks that range from deceptive to irrelevant. This is Benchmarking Crime B1; see https://arxiv.org/abs/1801.02381. > ML-KEM-768 is more than twice as fast as X25519. No, it's far more expensive than X25519. It's bottlenecked by the communication costs for the 1184-byte key and the 1088-byte ciphertext. To try to tilt this comparison towards ML-KEM-768, let's take a CPU that's far out of date compared to the network connection. Concretely, imagine a user at home with a poor little quad-core 3GHz Skylake from 2015 attached to a much newer 300Mbps Internet connection. Even if ML-KEM-768 were taking _zero_ CPU time, it would still be sending and receiving 8*(1184+1088) = 18176 bits overall, which can be done only 16505 times per second before swamping the 300Mbps. (This is imagining the 300Mbps being split about evenly between the upload and download; in the more common situation of, say, 300Mbps download and 20Mbps upload, the upload is saturated at ~2000 operations per second. Of course there are also framing costs etc.) Meanwhile X25519 is sending and receiving 8*(32+32) = 512 bits, i.e., 35x less than ML-KEM-768. This time the bottleneck is instead the CPU: the user is paying 27780 cycles for keygen plus 83503 cycles for DH (see https://bench.cr.yp.to/results-dh/amd64-samba.html), which can be done 26958 times per second per core, or 107833 times per second overall. Does 1/107833 of a decade-old home CPU sound like a bigger cost than 1/16505 of a much newer network connection? (Also, isn't this cost _obviously_ so close to 0 that we can just focus on security?) The way https://cr.yp.to/papers.html#pppqefs puts communication and computation on the same scale is by using dollar costs, for example looking at the purchase price of a specific new 32-core machine (assumed to die after 5 years) and of the electricity to run that machine. The same numbers are used in https://blog.cr.yp.to/20240102-hybrid.html to conclude that X25519 costs roughly 7% as much as ML-KEM-512. The main reason the numbers are rough is variations in network costs: the paper cites purchase costs ranging from 4x cheaper to 64x more expensive. ---D. J. Bernstein ===== NOTICES ===== This document may not be modified, and derivative works of it may not be created, and it may not be published except as an Internet-Draft. (That sentence is the official language from IETF's "Legend Instructions" for the situation that "the Contributor does not wish to allow modifications nor to allow publication as an RFC". I'm fine with redistribution of copies of this document; the issue is with modification. Legend language also appears in, e.g., RFC 5831. For further background on the relevant IETF rules, see https://cr.yp.to/2025/20251024-rules.pdf.)
_______________________________________________ TLS mailing list -- [email protected] To unsubscribe send an email to [email protected]
