I had planned to stay out of this, but... With the understanding that I largely agree with Viktor too. If the WG does not, or considers the requirements and restrictions of IDNA2008 unacceptable or too inconsistent with current practice, read on.
--On Thursday, February 2, 2023 13:36 +0000 Corey Bonnell <corey.bonn...@digicert.com> wrote: > Hi Peter, > >> it is not fully clear to me how the P-label construct differs >> from the A-label construct in RFC 5890 > > My understanding is that both U-labels and A-labels must > contain IDNA(2008)-valid strings. That is correct. Skipping your useful review of the details... >... > Given this, any label that is valid under UTS-46 but not > IDNA2008 cannot be called a "U-label" or "A-label". Also correct. > The lowest > common denominator between IDNA2003, 2008, and UTS-46 for ACE > labels is that they all contain valid output of the Punycode > algorithm, This may be just miscommunication because I don't understand quite what it means to describe those specification in arithmetic terms rather than set theory ones, but I don't think there is any such thing as a "lowest common denominator" for those specs. The difficulty is that the three specs define not only what they consider valid but what they define as invalid. For a very large number of code points and possible labels, all three agree about the valid ones. They also mostly agree about the invalid ones. (In case it isn't clear, that comment is consistent with at least one of Patrik's.) The Punycode algorithm does not discriminate and can encode all sorts of things including strings that are prohibited under all three of those specs and, fwiw, by UAX#31. Many, although not all, of the strings prohibited by all four, but that can be pushed into or out of the Punycode algorithm, would be things about which the lead authors of all four of those specs might well be able to join hands and say "a string like that, for use in a domain name label or identifier more generally, would be dangerous, errant nonsense, or both". >... > CAs that comply with the CA/Browser Forum Baseline > Requirements are permitted to encode only NR-LDH or P-labels > in SAN dNSNames, so there is at least assurance that valid > output of the Punycode algorithm is contained in labels that > start with "xn--" in the SAN of WebPKI certificates issued > today. This is certainly sub-optimal from the standpoint of > adherence to the IDNA2008 specification, but unfortunately is > the most interoperable given the wide variation in domain > registration practices and client handling of IDNs today. But, without having completely sorted out the applicability and use cases of the CA/Browser Forum Baseline Requirements or, for that matter, this spec, let me make a few observations... ** Because of the intersections among the three specs (or four, including UAX#31, which UTS#46 sort of incorporates by reference) about what they prohibit as well as the union of what they allow, one of the ways to write a spec involves applying a liberal dose of "someone else's problem" ointment. I have not studied the CA/Browser Forum spec carefully and have not looked at any of their documents that provide context to it, but my impression is that the "NR-LDH or P-labels" requirement comes fairly close to "we will accept anything that people give us as long as Unicode code points outside the ASCII range do not appear directly in the labels (e.g., as UTF-8 strings). If someone tries to register a certificate using a domain name that doesn't conform to any sensible spec, it is Not Our Problem". >From a business standpoint, possibly a great plan. From the standpoint of promoting interoperability (not conformance to particular specs, which is a different matter when those specs are not consistent), that is not a good move. ** One of, probably the, two most serious problems with the difference between IDNA2008 and UTS#46 are that: (i) Under IDN2008, the conversion between U-labels and A-labels is symmetric and reversible. For UTS#46 (and IDNA2003), because of the mapping built into the protocols themselves, if a source (native character) string is successfully pushed through the various checks and conversion from Unicode form to a Punycode-encoded form and then the process is reversed, there is no guarantee that the original string can be recovered. Mapping from Unicode to ASCII is potentially many-one; from ASCII to Unicode, one-many. The former might be ok; the latter is a big problem in several contexts, most important being presentation to the user. (ii) IDNA2008 requires some lookup-time validity checking to reduce the risk of various edge-case attacks, including ones related to irreversible mappings. UTS#46 is almost always interpreted as rejecting the need for lookup-time checking (i.e., it makes such attacks and other issues Someone Else's Problem). ** There is, possibly, a version of (my impression of) the CA/Browser model of shifting responsibility that might work for this document although I would not recommend it. You concentrate strictly on the forms that can validity, lexically, appear in the DNS under the "preferred syntax" rules of RFC 1034. That gets you NR-LDH labels and Punycode-encoded strings like the CA/Browser specs. You assume that you are only interested in strings conforming to those rules that appear in certificates and the domain part of RFC 3986-conforming URIs (presumably without percent-encoding). You tear out every bit of text about U-labels and any other native character ("Unicode") form or encoding. And you include, as a Security Consideration, an explicit discussion of the "Not Our Problem" principle, including at least a general discussion of where responsibility is expected to lie and/or what happens if no one actually takes responsibility. That, of course, matched the DNS resolution model: only those types of labels are looked up. I don't know if that would get through IETF Last Call -- saying "security is not our problem" has not been popular in the IETF in recent years -- but it would allow you to escape debates about conflicting standards, conflicting claims about current usage, the same "Unicode string" mapping to different Punycode-encoded labels depending on which spec is being followed, and all sorts of validity questions. This would mean that any problems reported to a user and that showed the problem domain name or label would need to do so in Punycode form. That would be wildly unpopular and might be considered in some quarters an IDN-killer. But that would be Someone Else's Problem too. Good luck. john _______________________________________________ Uta mailing list Uta@ietf.org https://www.ietf.org/mailman/listinfo/uta