Hi Patrik,

Thanks for taking the time to provide such a detailed message, and my apologies for the delayed reply. Comments inline.

On 2/2/23 6:59 AM, Patrik Fältström wrote:
On 2 Feb 2023, at 9:58, Peter Saint-Andre wrote:

On 2/1/23 6:17 AM, Corey Bonnell wrote:

I think it would be unfortunate if the usage of terms that are defined in
RFC 5890 is not aligned with their definitions.

If we are not opposed to introducing new terminology to the document, then I
suggest the following:

1.      Replace all instances of "A-label" with the term "P-label" from the
CABF Baseline Requirements [1]: "P-Label: A XN-Label that contains valid
output of the Punycode algorithm (as defined in RFC 3492, Section 6.3) from
the fifth and subsequent positions."
2.      For U-label:
        a. Punt and call it "Unicode representation" instead (this is what
the CABF Baseline Requirements does, although that may not be appropriate
for this document).
        b. Create a new term that is defined as "A non-LDH label that
contains valid output of the decoding algorithm for Punycode (as defined in
RFC 3492, Section 6.2)." and use this new term instead of "U-label".

I'd be happy to work on concrete text to this effect if there's agreement
this is a good path to resolve the issue.

I would very much like to hear what John Klensin and Patrik Fältström (cc'd) 
think about this proposal.

As noted in my other message 
<https://mailarchive.ietf.org/arch/msg/uta/92tKoHT3Kjll1o_mCYQYQT8xON4/> I'm 
not immediately comfortable with referencing a CA/Browser Forum document instead of 
RFC 5890.

Having looked at Corey's proposal more closely, I'm doubly unsure because (a) 
it is not fully clear to me how the P-label construct differs from the A-label 
construct in RFC 5890 and (b) coming up with new DNS-related terminology in a 
late-stage document about certificate validation just seems like a bad idea 
(e.g., I'm not sure how to get proper review) even if it were necessary (which 
I'm not sure it is).

Thanks for being brought into this discussion Peter.

I had a read of the document and have these direct comments:

    delegated domain:  A domain name or host name that is explicitly
       configured for communicating with the source domain, either by the
       human user controlling the client or by a trusted administrator.
       For example, an IMAP server at mail.example.net could be a
       delegated domain for a source domain of example.net associated
       with an email address of u...@example.net.

This might be confusing as it is using the term "delegated" and give indeed an example where 
"mail.example.net" might (or might not) be delegated from "example.net", while the 
administrator of an imap server at a specific domain name might have no similarities at all with the MX 
record of the domain to which email is to be sent to end up in the named IMAP server.

So I think a better example is to either use the term "delegated" when it 
really talks about DNS delegation, OR, you use a different term but have an example where 
you can have:

- IMAP server: imap.example.se.
- MX target: mx.example.net.
- Email domain: example.com.

Although you might be right that "delegated domain" is less than ideal, it's the term we used in RFC 6125. As a result, a number of specifications that cite RFC 6125 also use the term, so it seems inadvisable to change terminology now.

The original idea was not DNS delegation at the nameserver level, but service delegation at the application level such as one finds in this document (e.g., in order to retrieve email for addresses at example.net, one configures one's email client to connect to the server at imap.example.net).

At the least, it seems reasonable for us to explain this in more detail so that the reader doesn't confuse this perhaps bespoke notion of service delegation with the perhaps more established notion of DNS delegation.

    derived domain:  A domain name or host name that a client has derived
       from the source domain in an automated fashion (e.g., by means of
       a [DNS-SRV] lookup).

Also MX?

I don't see why not. If DNS SRV records had existed from the beginning of time, it seems that email protocols would have used SRV rather than MX, right?

What is then the difference or similarity between an MX related derivation of 
one domain name from another and an SRV related derivation?

It seems to me that they are functionally equivalent. But I am not a DNS expert or email expert, so (leaving aside various nuances) I might be missing some essential difference.

Can a delegated domain also be derived?

Not really. The idea is that a delegated domain is explicitly configured client-side whereas a derived domain is obtained in an automated fashion via DNS. So they are two different constructs that play two different roles in protocols.

    source domain:  The FQDN that a client expects an application service
       to present in the certificate.  This is typically input by a human
       user, configured into a client, or provided by reference such as a
       URL.  The combination of a source domain and, optionally, an
       application service type enables a client to construct one or more
       reference identifiers.

I presume you also include domain names that one at a time is created using a search list construction in a DNS stub resolver?

If I understand you correctly, I would say that we have not had a theory about how domain names are created (e.g., using a suffix search list). And it's not clear to me that we need to have such a theory here.

I.e. what you talk about is really a FQDN?

That is the intent - no bare hostnames or, more generally, no domain names that do not include all labels.

I think this is a good thing, but hope people to understand what this implies.

I hate search lists and relative domain names.

    The DNS name conforms to one of the following forms:

    1.  A "traditional domain name", i.e., a FQDN that conforms to
        "preferred name syntax" as described in Section 3.5 of
        [DNS-CONCEPTS] and for which all of its labels are "LDH labels"
        as described in [IDNA-DEFS].  Informally, such labels are
        constrained to [US-ASCII] letters, digits, and the hyphen, with
        the hyphen prohibited in the first character position.
        Additional qualifications apply (refer to the above-referenced
        specifications for details), but they are not relevant here.

    2.  An "internationalized domain name", i.e., a DNS domain name that
        includes at least one label containing appropriately encoded
        Unicode code points outside the traditional US-ASCII range and
        conforming to the processing and validity checks specified for
        "IDNA2008" in [IDNA-DEFS] and the associated documents.  In
        particular, it contains at least one U-label or A-label, but
        otherwise may contain any mixture of NR-LDH labels, A-labels, or
        U-labels.

This is confusing

What specifically do you think is confusing? We tried to get it right, but clearly didn't succeed...

and it seems people misunderstand the big changed we went through in the IETF 
from IDNA2003 to IDNA2008.

In IDNA2008 we have:

- Got rid of mapping, i.e. mapping like case folding is something happening in 
application layer, and have nothing to do with "domain names".
- Have a 1:1 mapping between A-label and U-label.
- In theory because of this can have A-label and U-label for domain names that 
include by IDNA2008 not allowed Unicode code points (or not allowed code point 
by other policy rules, for example the ones a registry have).

I stronly recommend you have similar rules here. Separate potential mapping 
from comparison of domain names which in turn must be separated from policy for 
what code points are allowed.

When you say "have similar rules here", are you suggesting that we define such rules outside the context of IDNA2008 (e.g., in a way that would be valid for both IDNA2008 and IDNA2003 + UTS-46?) I think it would be a challenge to get that right and I'm not confident that a document about certificate matching is the correct place to do so.

Ok, onwards...

    If the DNS domain name portion of a reference identifier is a
    traditional domain name, then matching of the reference identifier
    against the presented identifier MUST be performed by comparing the
    set of domain name labels using a case-insensitive ASCII comparison,
    as clarified by [DNS-CASE].  For example, WWW.Example.Com would be
    lower-cased to www.example.com for comparison purposes.  Each label
    MUST match in order for the names to be considered to match, except
    as supplemented by the rule about checking of wildcard labels given
    below.

    If the DNS domain name portion of a reference identifier is an
    internationalized domain name, then the client MUST convert any
    U-labels [IDNA-DEFS] in the domain name to A-labels before checking
    the domain name or comparing it with others.  In accordance with
    [IDNA-PROTO], A-labels MUST be compared as case-insensitive ASCII.
    Each label MUST match in order for the domain names to be considered
    to match, except as supplemented by the rule about checking of
    wildcard labels given below.

All of the above can be replaced by just saying that "A domain name is to be 
compared using case insensitive matching according to what DNS uses, and this because of 
this include domain names that have A-Labels in them" and reference IDNA2008.

It seems that we should at least say that U-labels need to be converted to A-labels first, no? Or do you think that is implied by referencing the DNS rules (which don't allow U-labels natively)?

It *might* also include wording about:

- If a domain name include unicode characters, and case folding equivalent 
approximate matching is expected by the client, mapping from one unicode 
character to another must take place before the A-label is created from the 
U-label. And reference section 4.2 in RFC 5894.

Thanks for the reminder about that section.

Do not come up with your own words please!

Agreed.

- If a domain name include code points that are DISALLOWED according to IDNA2008 or any 
other policy, for example a registry, it MUST be defined in this document whether it 
SHOULD be allowed to do a comparison of the domain names or not. If a label include 0x00 
bytes for example (which is normally never allowed in any protocol) should such a lable 
be able to get a "match" when the domain name is to be compared?

It seems like a bad idea to match on DISALLOWED code points! But see below.

Please be specific in the general case!

    A wildcard in a presented identifier can only match exactly one label
    in a reference identifier.  Note that this is not the same as DNS
    wildcard matching, where the "*" label always matches at least one
    whole label and sometimes more.  See [DNS-CONCEPTS], Section 4.3.3
    and [DNS-WILDCARDS].

Wow, wildcards in DNS is hairy. I know some people knows this, be careful, as 
wildcards in DNS is very different from (so far) wildcards in certificates.

I believe we included that text only to note that the wildcard matching for certificates is more constrained that for DNS. Do you think that further clarifications are needed?

    An IP-ID matches based on an octet-for-octet comparison of the bytes
    of the reference identity with the bytes contained in the iPAddress
    subjectAltName.  Because the iPAddress field does not include the IP
    version, a helpful heuristic for implementors is to distinguish IPv4
    addresses from IPv6 addresses by their length.

Why "octet by octet"?

Do you suggest some other text? Specifically do you have in mind "bit by bit" perhaps?

The field include either a 32 bit or 128 bit field. If what is compared have 
different length, the match is False. If the length is the same, the values are 
compared. If they are the same, the match is True, otherwise False.

We were trying to be more precise about what "the same" means, but as we know it can be a challenge to get that right.

    If the identifier is an SRV-ID, then the application service name
    MUST be matched in a case-insensitive manner, in accordance with
    [DNS-SRV].  Note that the _ character is prepended to the service
    identifier in DNS SRV records and in SRV-IDs (per [SRVNAME]), and
    thus does not need to be included in any comparison.

Please reference one place in this document where case sensitivity is 
explained. Do not repeat text.

Noted.

7.3.  Internationalized Domain Names

    As specified under Section 6, matching of internationalized domain
    names is performed on A-labels, not U-labels.  As a result, potential
    confusion caused by the use of visually similar characters in domain
    names is likely mitigated in certificate matching as described in
    this document.

    As with URIs and URLs, there are in practice at least two primary
    approaches to internationalized domain names: "IDNA2008" (see
    [IDNA-DEFS] and the associated documents) and an alternative approach
    specified by the Unicode Consortium in [UTS-46].  (At this point the
    transition from the older "IDNA2003" technology is mostly complete.)

Not really...it is neither one or the other.

The basis for all domain names is what is defined in DNS, and that is IDNA2008.

The differences from UTS-46 are specifically two things:

- UTS-46 also include rules for mapping that IDNA2008 does not include. The mapping that 
might be performed according to UTS-46 is "out of scope" for IDNA2008.

- What code points are allowed in the ultimate domain name is slightly 
different.

But, we have people using domain names (i.e. in the wild) which are neither 
allowed in UTS-46 or IDNA2008.

And, then there are people using the algorithm in IDNA2008 applied to versions 
of Unicode that IETF have not approved yet.

So, once again, not "either or". It is "a little bit of everything".

I see what you mean. However, that makes it more difficult to specify recommended behavior.

As one example, it seems possible that these differences could lead to someone using domain names in the wild that include DISALLOWED code point (e.g., because the definition of which code points are DISALLOWED can vary across Unicode versions). Thus if we say that applications MUST NOT match on DISALLOWED code points, behavior could be inconsistent.

    Differences in specification, interpretation, and deployment of these
    technologies can be relevant to Internet services that are secured
    through certificates (e.g., some top-level domains might allow
    registration of names containing Unicode code points that typically
    are discouraged, either formally or otherwise).  Although there is
    little that can be done by certificate matching software itself to
    mitigate these differences (aside from matching exclusively on
    A-labels), the reader needs to be aware that the handling of
    internationalized domain names is inherently complex and can lead to
    significant security vulnerabilities if not properly implemented.

    Relevant security considerations for handling of internationalized
    domain names can be found in [IDNA-DEFS], Section 4.4, [UTS-36], and
    [UTS-39].

Does that text seem correct or appropriate?

Do you have opinions on Corey's suggestion to use P-labels instead of U-labels and to reference the CA/Browser Forum specifications?

https://mailarchive.ietf.org/arch/msg/uta/r5uJRGUzCC55XH4XSnwtMB2YWPA/

Again, many thanks for the thorough review.

Peter


_______________________________________________
Uta mailing list
Uta@ietf.org
https://www.ietf.org/mailman/listinfo/uta

Reply via email to