On 5/12/23 23:09, Viktor Dukhovni wrote:
Repost of my belated comments in the thread, apologies about not doing
it right the first time...

Inspired by Viktor's comments, I spent some time to give the document a 
thorough review.

I'd like to support Viktor's comments on the dependent RRset TTL cap described 
in Section 9.

I feel that the recommendation there is potentially harmful while its benefit 
is unclear. As for the harm, it makes DS updates less flexible because it 
effectively pushes their TTL towards higher values (so that caches remain 
effective). While always-low DS TTLs are problematic, too, it doesn't seem like 
a sound concept for an auth's load to be essentially inversely proportional to 
the DS TTL when it is set to a low value temporarily.

As for the benefit, the objective appears to be "preponing" the removal of cached RRsets 
from their scheduled expiry to "as soon as they potentially would not longer validate", 
as indicated by upstream TTLs related to the trust chain. However, there's no need to do this based 
on TTLs alone: if one wants to pursue this (optional) objective, it is sufficient to revalidate 
once an *actual* change in the DNSKEY or DS set is detected. But even in the face of a sudden 
change in the trust relationship, it's not clear whether ignoring a signed (!) long TTL is 
beneficial, as that might harm stability and resilience during time periods of configuration 
errors, which the cache would otherwise help survive.


Second, I'm confused about the normative language in this informational 
document. (There are about 20 occurrences of MUST and about 40 of 
SHOULD/RECOMMENDED.)


Third, The document contains several inaccurate or contradictory statements. 
One example is related to  Section 7.1.4, which says:
   *  DNS resolver MUST validate the TA before starting the DNSSEC
      resolver, and a failure of TA validity check MUST prevent the
      DNSSEC resolver to be started.  Validation of the TA includes
      coherence between out-out band values, values stored in the DNS as
      well as corresponding DS RRsets.

The recommendation says that a resolver may not be started if it's trust 
anchors are incoherent with values obtained from the DNS.

My understanding is that the purpose of a trust anchor is to pin a trusted key 
for a name, in a self-contained fashion, without relying on its confirmation 
through some other channel (e.g. corresponding DS records). If a trust anchor 
is required to be coherent with values stored in the DNS, then the trust anchor 
doesn't appear to be needed in the first place.

It is also left open how the DRO should check "coherence between out-out band values, values 
stored in the DNS as well as corresponding DS RRsets" for their root trust anchors. There are 
no DS records, so you can check ... DNSKEY? Hm. Then, what exactly to check? Also, what about 
IANA's root-anchors.xml file (RFC 7958)? -- The problem here is that "values stored in the 
DNS" is underspecified, although one MUST comply with it.

What's more, Sections 7.1.2.1 has:
   Besides deployments in
   networks other than the global public Internet (hence a different
   root), operators may want to configure other trust points.

Now, how would the above recommendation (enforce trust anchor coherence with 
DNS) be enforced in such a setting?


That said, I wrote up some of my (pencil-on-printout) comments from the 
remainder of the document; you can find them below.

Looking at my scribblings, large parts of the document seem to lack clarity (at 
least to me). The parts which are not unclear to me (few scribblings) are

- Section 1-4 (intro material and boilerplate)
- Section 6 (importance of correct wall time)
- Section 7.1 until 7.1.2 included (trust anchor intro)
- Section 13 (transport considerations)
- Section 14 (IANA considerations)

... with the missing sections containing the meat of the recommendations. As I 
find many of the them unclear, I'm not sure I support the document as is, 
simply because I have a hard time following what is says.

I'd like to emphasize that I appreciate the work and effort that went into the 
document. I just think that for it to be helpful guidance (and for the actual 
recommendations and arguments to be discussed), a lot of work on clarity is 
needed. My review is intended as constructive feedback, not as harsh criticism.

Best,
Peter


Section 5:
   A DRO needs to be able to enable DNSSEC validation with sufficient
   confidence they will not be held responsible in case their resolver
   does not validate the DNSSEC response.  The minimization of these
   risks

This sounds like a managerial document from a business risk department. -- As 
the opening paragraph of the section laying out the justification for the 
different recommendations types, I wonder if this is a sufficiently stringent 
argument for justifying technical guidance.

In the same section, there are some occurrences of rather obscure language:
   The
   recommendations do not come with the same level of recommendations

or

   Some recommendations may simply not be
   provided by the operated software

I'm not sure what these things mean.

Similarly, in Section 6:
   For all recommendations, it is strongly RECOMMENDED that
   recommendations are supported by automated processes.

Section 6 also has:
   *  While operating, a DRO MUST closely monitor time derivations of
      the resolvers and maintain the time synchronized.

s/derivations/deviations/

A point that's missing here is (how) to take into account the effects of time 
adjustments on stored TTLs.

Section 7 explains three types of trust-anchor-related recommendations, namely 
initial provisioning, updates, and reporting. It then says:
   Note that TA update and TA reporting only concerns running resolvers.

It's unclear to me why this is written down here. It's prefectly clear that 
when nothing's running, then no validation is going on, so there's no reporting 
or updating of validation trust anchors.

This kind of "requirement fencing" is familiar to me from risk management 
documents, where the requirements author attempts to prevent a manager potentially not 
familiar with the topic to enforce certain requirements in contexts where they are not 
applicable.

I have no idea whether this is the case here, but I remain unconvinced of the need to say 
that validation-related things only apply to running resolvers. In fact, I find such 
statements distracting (and as such an anti-pattern), as they make me think "what's 
this, did I miss something? is there an edge case?".

Section 7.1.2 has:
   Although some bootstrapping mechanisms to securely retrieve publish
   [RFC7958] and retrieve [UNBOUND-ANCHOR] the Root Zone Trust Anchor
   have been defined, it is believed these mechanisms should be extended
   to other KSKs or Trust Anchors.

believed by whom?

Another example of fuzzy language is in 7.1.2.1, which says:
   For validators that may be used on the global public Internet (with
   "may be" referring to general purpose, general release code),
   handling the IANA managed root zone KSK trust anchor is a
   consideration.

It's a thing to get right (and not a consideration).

Section 7.1.3:
   The generation of a configuration file associated to the TA is
   expected to be implementation independent.  The necessity of tweaking
   the data [...]

In general, TA configuration does not require generation of a configuration 
file. (An implementation might just a well take them from something like 
/etc/resolver/trust-anchors.d/, with each file therein containing DS-type 
records, and the the domain somehow encoded in the filename.)

It's not clear what "tweaking the data" means (neither which data, nor in what 
way they are tweaked).

This suggests that the author of this text has a specific context in mind, from 
which the line of argument descends (similarly to the managerial framing in 
other sections).

Section 7.2:
   This includes for a DRO the ability to
   check which TA are in used as well as to resolve in collaboration of
   authoritative servers and report the used TAs.

I am not sure what this means. Resolve what -- DNS queries, or trust anchor 
issues? Something else?

Section 7.2.1:
   Trust is inherently a matter of an operations policy.  As such, a DRO
   will need to be able to update the list of Trust Anchors.  TA updates
   are not expected to be handled manually.  This introduces a
   potentially huge vector for configuration errors

Probably the opposite is meant? (No manual handling --> less configuration 
errors?)

   Instead DRO will rely on "Automated Updates to DNSSEC
   Trust Anchors" [RFC5011]

Well, perhaps; implementation is not mandatory.

The two SHOULD recommendations (check TA publisher commitment to RFC 5011, and 
enable RFC 5011 automatic updates) in that section are phrased as independent, 
but they are not. (There's no point to the first recommendation when the second 
is not conditional on the outcome of the first.)

The first paragraph of Section 7.2.2 says:
   A DRO SHOULD regularly check the trust anchor used by the DNSSEC
   resolver is up-to-date and that values used by the resolvers are
   conform to the ones in the configuration

I find this quite fuzzy. Does this mean that the software should detect 
configuration changes and reload the trust anchor?

In any case, this section is about "regular checks", but its first explicit 
recommendation is for "STARTUP", which seems inconsistent.

   In the case of a key roll
   over, the resolver is moving from an old value to an up-to date
   value.  This up-to-date value does not need to survive reboot, and
   there is no need to update the configuration file of the running
   instances - configuration is updated by a separate process.  To put
   it in other words, the updated value of the TA is only expected to be
   stored in the resolver's memory.  Avoiding the configuration file to
   be updated prevents old configuration file to survive to writing
   error on read-only file systems.

I'm not convinced. Rollover from a very old trust anchor to a new one may not 
be possible indefinitely, like when you reboot three years later.

Also, booting with a trust anchor that was broken long ago is insecure, as an 
attacker might exploit that by subsequently forging the rollover. It seems more 
prudent to write rollovers to permanent storage, at least when the algorithm or 
key size is changed. Not doing so is effectively trusting the old key 
indefinitely, against better knowledge.

The recommendation there says:
   *  DRO SHOULD enable "Signaling Trust Anchor Knowledge in DNS
      Security Extensions (DNSSEC)" [RFC8145] to provide visibility to
      the TA used by the resolver.  The TA can be queried using a DNSKEY
      query.

This is not about querying the TA.

   Note also that [RFC8145] does not only concern Trust Anchor but is
   instead generic to DNSKEY RRsets.  As a result, unless for the root
   zone, it is not possible to determine if the KSK/ZSK or DS is a Trust
   Anchor or a KSK/ZSK obtained from regular DNSSEC resolutions.

DROs (who are the subject of the document) can easily determine whether a key 
or DS has been obtained from a trust anchor or from regular resolution is 
easily possible: just look at whether a trust anchor is configured for the 
name, or whether a DS query was issued.

Transferring the note from my print-out, I realize that perhaps what was meant 
is that the recipient of an RFC 8145 signal cannot tell whether the signaled 
key is a trust anchor. I did not realize that the first few times I read it.

   A failed key roll over or any other abnormal situation MUST trigger
   an alarm.

What does "alarm" mean in this context? (It's underspecified but mandatory.)

   If the mismatch is due to a failed key roll-over, this SHOULD be
   considered as a bug by the DRO.  The DRO MUST restart the resolver
   with updated TA.

Why should it be considered a bug? It may just be a misconfiguration.

The situation here is after a failed rollover. Restart with what updated TA? Is 
the intention here that the DRO handles this manually? (That is discouraged in 
other parts of the document.)

   *  A DRO SHOULD be able to check the status of a TA as defined in
      Section 3 of [RFC7583].

I can't find anything like that in this section. (It deals with key rollover 
timing, not with trust anchor checks.)

Section 8:
   The intent of this section is to position these
   guidelines toward the operational recommendations provided in this
   document.

This is not technical advice. It sounds like an internal compliance document. 
Who is the audience of this document?

   *  DRO SHOULD set automated procedures to determine the NTA of DNSSEC
      resolvers.

What does that mean?

   A failure in signaling validation is associated to a mismatch between
   the key and the signature.

What signaling?

A validation mismatch is not necessarily between key and signature, it may also 
be between data and signature.

   In addition, DRO are likely to
   have specific communication channel with TA maintainer which eases
   trouble shooting.

Why should that be so / what's the basis for the likelihood statement?

   A signature validation failure is either an attack or a failure in
   the signing operation on the authoritative servers.

Or something else, like a misconfiguration of a DS record, or a validation bug, 
or ...

The last recommendation in this section is MAY (which admits either way), although it is 
labeled a "recommendation" (which implies a preference for doing it).

Section 9:
   the DNSSEC validator performs a DNSSEC query to
   the authoritative server that returns the RRset signed with the new
   KSK / ZSK.  The DNSSEC validator may not be able to retrieve the new
   KSK / ZSK

Why should it be the case that the resolver can query some RRset, but not the 
DNSKEY RRset?

   This either results in a bogus resolution or in an
   invalid signature check.

What's the difference?

   Note that by comparing the Key Tag Fields,
   the DNSSEC validator is able to notice the new KSK / ZSK used for
   signing differs from the one used to generate the received generated
   signature.

The key tags may be the same even when the key differs.

   However, the DNSSEC validator is not expected to retrieve
   the new ZSK / KSK, as such behavior could be used by an attacker.

I am confused what this could mean.

   Note also that even though the data may
   not be associated to the KSK / ZSK that has been used to validate the
   data, the link between the KSK / ZSK and the data is still stored in
   the cache using the RRSIG.

This seems highly implementation-dependent.

All of the comments so far on Section 9 relate to two paragraphs, which I don't 
think are necessary for what follows. Instead of fixing the inaccuracies, it 
may be better to just drop them.

Further down in the section, the text mentions "TTL associated with FQDNs", 
which is not accurate as a name can have several RRsets with different TTLs.

Apart from that, I disagree with the recommendation in this section (see 
beginning of this message).

Section 10.1:
   A DRO MAY regularly report the Trust Anchor used to the authoritative
   server.  This would at least provide insight to the authoritative
   server and provide some context before moving a key roll over
   further.

The question is what the authoritative should do with this information, if lots 
of trust anchors are report that have not been updated.

That's probably out of scope for this document, but nevertheless an immediate 
question: Should the rollover process shall be stalled? That would open up a 
trivial path to block the rollover. If not, what then? -- Perhaps it's better 
to not get into this and drop the last half sentence.

Section 10.2
   Similarly, a DRO may be informed by other channel a rogue
   or unwilling DNSKEY has been emitted.

What's an unwilling DNSKEY?

   *  A DRO MUST be able to flush the cached data subtree associated to
      a DNSKEY

It seems to me that at the MUST level, it's sufficient to flush the cache as a 
whole.

Section 11:
   *  A DRO SHOULD regularly request and monitor the signature scheme
      supported by an authoritative server.

What does that mean?

   *  A DRO SHOULD report a "Unsupported DNSKEY Algorithm" as defined in
      [RFC8914] when a deprecated algorithm is used for validation.

Is this meant for rcode 0 responses?

   One inconvenient to such strategy i sthat it does not let one DRO to
   take advantage of more recent cryptographic.

Why?

Section 12:
   12. Invalid Reporting Recommendations

This section title seems confusing.

   An invalid response may be the result of an attack or a
   misconfiguration, and the DNSSEC validator may play an important role
   in sharing this information with the authoritative server or domain
   name owner.

I'm not sure I agree with this. It's probably not a good idea if all validating 
resolvers start contacting a specific domain owner.

Section 13:
   RUNTIME: * DRO SHOULD regularly discover MTU

I'm no expert here; does this really need regular checks, or is there a value 
that's generally considered safe? If regular checking is done, how frequently 
would be reasonable?

Section 15:
   Providing inappropriate information can lead to misconfiguring the
   DNSSEC validator, and thus disrupting the DNSSEC resolution service.

Not sure what "providing inappropriate information" means here.

   RRSet that were
   cached require a DNSSEC resolution over the Internet

when queried.

   An attacker may ask the DNSSEC validator to consider a rogue KSK/ZSK,
   thus hijacking the DNS zone.

How so?

   An attacker (cf.  Section 7) can advertise a "known insecure" KSK or
   ZSK is "back to secure"

How so?

--
https://desec.io/

_______________________________________________
DNSOP mailing list
DNSOP@ietf.org
https://www.ietf.org/mailman/listinfo/dnsop

Reply via email to