I had planned to stay out of this, but...

With the understanding that I largely agree with Viktor too.  If
the WG does not, or considers the requirements and restrictions
of IDNA2008 unacceptable or too inconsistent with current
practice, read on.

--On Thursday, February 2, 2023 13:36 +0000 Corey Bonnell
<corey.bonn...@digicert.com> wrote:

> Hi Peter,
> 
>> it is not fully clear to me how the P-label construct differs
>> from the A-label construct in RFC 5890
> 
> My understanding is that both U-labels and A-labels must
> contain IDNA(2008)-valid strings. 

That is correct.  Skipping your useful review of the details...

>...
> Given this, any label that is valid under UTS-46 but not
> IDNA2008 cannot be called a "U-label" or "A-label".

Also correct.

> The lowest
> common denominator between IDNA2003, 2008, and UTS-46 for ACE
> labels is that they all contain valid output of the Punycode
> algorithm,

This may be just miscommunication because I don't understand
quite what it means to describe those specification in
arithmetic terms rather than set theory ones, but I don't think
there is any such thing as a "lowest common denominator" for
those specs.  The difficulty is that the three specs define not
only what they consider valid but what they define as invalid.
For a very large number of code points and possible labels, all
three agree about the valid ones.  They also mostly agree about
the invalid ones.  (In case it isn't clear, that comment is
consistent with at least one of Patrik's.) The Punycode
algorithm does not discriminate and can encode all sorts of
things including strings that are prohibited under all three of
those specs and, fwiw, by UAX#31.  Many, although not all, of
the strings prohibited by all four, but that can be pushed into
or out of the Punycode algorithm, would be things about which
the lead authors of all four of those specs might well be able
to join hands and say "a string like that, for use in a domain
name label or identifier more generally, would be dangerous,
errant nonsense, or both".

>...

> CAs that comply with the CA/Browser Forum Baseline
> Requirements are permitted to encode only NR-LDH or P-labels
> in SAN dNSNames, so there is at least assurance that valid
> output of the Punycode algorithm is contained in labels that
> start with "xn--" in the SAN of WebPKI certificates issued
> today. This is certainly sub-optimal from the standpoint of
> adherence to the IDNA2008 specification, but unfortunately is
> the most interoperable given the wide variation in domain
> registration practices and client handling of IDNs today.

But, without having completely sorted out the applicability and
use cases of the CA/Browser Forum Baseline Requirements or, for
that matter, this spec, let me make a few observations...

** Because of the intersections among the three specs (or four,
including UAX#31, which UTS#46 sort of incorporates by
reference) about what they prohibit as well as the union of what
they allow, one of the ways to write a spec involves applying a
liberal dose of "someone else's problem" ointment.  I have not
studied the CA/Browser Forum spec carefully and have not looked
at any of their documents that provide context to it, but my
impression is that the "NR-LDH or P-labels" requirement comes
fairly close to "we will accept anything that people give us as
long as Unicode code points outside the ASCII range do not
appear directly in the labels (e.g., as UTF-8 strings).  If
someone tries to register a certificate using a domain name that
doesn't conform to any sensible spec, it is Not Our Problem".
>From a business standpoint, possibly a great plan.  From the
standpoint of promoting interoperability (not conformance to
particular specs, which is a different matter when those specs
are not consistent), that is not a good move.

** One of, probably the, two most serious problems with the
difference between IDNA2008 and UTS#46 are that: (i) Under
IDN2008, the conversion between U-labels and A-labels is
symmetric and reversible.  For UTS#46 (and IDNA2003), because of
the mapping built into the protocols themselves, if a source
(native character) string is successfully pushed through the
various checks and conversion from Unicode form to a
Punycode-encoded form and then the process is reversed, there is
no guarantee that the original string can be recovered.  Mapping
from Unicode to ASCII is potentially many-one; from ASCII to
Unicode, one-many.  The former might be ok; the latter is a big
problem in several contexts, most important being presentation
to the user. (ii) IDNA2008 requires some lookup-time validity
checking to reduce the risk of various edge-case attacks,
including ones related to irreversible mappings.  UTS#46 is
almost always interpreted as rejecting the need for lookup-time
checking (i.e., it makes such attacks and other issues Someone
Else's Problem).

** There is, possibly, a version of (my impression of) the
CA/Browser model of shifting responsibility that might work for
this document although I would not recommend it.  You
concentrate strictly on the forms that can validity, lexically,
appear in the DNS under the "preferred syntax" rules of RFC
1034.  That gets you NR-LDH labels and Punycode-encoded strings
like the CA/Browser specs.  You assume that you are only
interested in strings conforming to those rules that appear in
certificates and the domain part of RFC 3986-conforming URIs
(presumably without percent-encoding).  You tear out every bit
of text about U-labels and any other native character
("Unicode") form or encoding.  And you include, as a Security
Consideration, an explicit discussion of the "Not Our Problem"
principle, including at least a general discussion of where
responsibility is expected to lie and/or what happens if no one
actually takes responsibility.  That, of course, matched the DNS
resolution model: only those types of labels are looked up.   I
don't know if that would get through IETF Last Call -- saying
"security is not our problem" has not been popular in the IETF
in recent years -- but it would allow you to escape debates
about conflicting standards, conflicting claims about current
usage, the same "Unicode string" mapping to different
Punycode-encoded labels depending on which spec is being
followed, and all sorts of validity questions.  This would mean
that any problems reported to a user and that showed the problem
domain name or label would need to do so in Punycode form.  That
would be wildly unpopular and might be considered in some
quarters an IDN-killer.  But that would be Someone Else's
Problem too.

Good luck.
   john

_______________________________________________
Uta mailing list
Uta@ietf.org
https://www.ietf.org/mailman/listinfo/uta

Reply via email to