I've reviewed this draft and have a number of comments:

At a high level, I think this draft is a very important piece of the sidr 
landscape, so I certainly applaud Randy for writing it.

- The second sentence in the abstract is a fragment, without a direct object.

Section 1 Intro:
- 1st paragraph: if we are going to predicate advice on terms like "widespread 
deployment," I think we need define them.  I know we all roughly know things 
like, "widespread means `a lot' of deployment," but the assertion that 
RPKI-based origin validation has a dependency on some level of penetration 
either needs to be qualified (or better yet, quantified) or removed, imho.

- 2nd para: "... the next year to five years." As a living document, this work 
might want to stay away from relative time references.  Will this statement 
need to be updated every year?
- 2nd para: "... eventually there will be a single root..." Is this assumed in 
order for this document's advice to matter?  It seems like operational advice 
ought to be more topical than this.  The advice in this draft is intended to be 
relevant before the single root, so this reference seems quite out of place.

- 3rd para: s/AS's/AS'/

Section 3:

- 2nd para: Some intuition behind _why_ hierarchy affects performance would be 
quite useful.

- 3rd para: As this is an operational document, and there is nothing else 
available, shouldn't this text explain that there is currently no choice of 
what to use?

- 4th para: The edict that operators should make use of the cache-chaining 
facility seems like it should have some operational 
explanation/justification/intuition/something.  It may be good advice, but 
shouldn't there be some rationale that allows for the evaluation of tradeoffs?
- 4th para: "Of course, the recipient relying parties SHOULD re-validate the 
date."  This makes it seem (imho) like the benefits of the immediately 
preceding advice might be lessened... Without an explanation of the rationale, 
the reader is forced to ask if doing this re-validtion wouldn't cause the same 
scaling worries as before the chaining.  Without a more detailed explanation, 
the reader really can't tell.

- 5th para: This seems quite inappropriate to me.  We need to know how this 
design will scale and function in an operational setting.  If there's any place 
that this should be discussed, it would seem to be here.  The operational 
behavior, configuration, and dynamics of the system seem like they must be 
described in the operational guidance documents.  How else should an operator 
know what configurations result in which behaviors, and scaling properties, 
etc?  I don't think this draft can simply punt by saying these operational 
concerns are beyond the scope of its operational guidance.

- 6th para: What if the objects in a network are multi-mastered?  I wasn't able 
to tell what this paragraph's guidance was, but it seemed to be a little narrow 
in its application.  Maybe it is superfluous?

- 7th para: I think this advice seems a little dilute, and may not be as useful 
as it could be.  Clearly, network configurations can be quite varied, right?  
Since this is the case, I think it would be much more helpful for the author to 
build a strawman here, and overlay advice on it.  In fact, perhaps a set of 
running strawmen throughout the document would allow various pieces of advice 
to be hung on specific examples that operators could then adapt to their own 
configurations.  At the very least, it would provide some context, and possibly 
some additional substance to the document.

- 8th para: This seems like good advice, but it could use an example (i.e., the 
above comment).

- 9th para: This paragraph seems to give direction without intuition of the 
cost/benefits tradeoffs of this decision or the deployment decisions (like how 
many).  More generally, it seems like there is important advice to give here, 
but maybe the experts should frame the advice in terms of something like: what 
(specifically) does an operator pay for $n$ peers and what (specifically) does 
she gain?

- 10th para: I don't think the text is clear: it seems like upstreams carrying 
traffic and the trust one has in attestation objects are quite different.  I 
don't think this is an apt analogy, and it really confuses me (as a reader).  
As a result, it's hard for me to understand what the point of that paragraph is 
(given the inapt analogy).
- 10th para: With the above caveat that I might not be understanding what is 
being said, the final sentence raises additional concerns for me.  If we 
recommend that operators use each others' caches, and then force them to 
revalidate, we are either introducing a new attack vector (cache poisoning of 
non-authoritative caches), or we increasing the attack surface of an existing 
attack vector (more caches must be validated because they can lie to me).  
Either way, I don't see the benefit gained here, just the drawback.

- 11th para: How does trusting caches relate to mandatory revalidation?  As I 
read the text (at least, as written), it seems to me that this is a conflation 
of very different concepts.

- 12th para: Should we define the term ``super-block'' before using it here?  
I'm more used to seeing it in the context of filesystems, but that doesn't mean 
we can't overload its definition here... I just think we need to do so before 
using it.

- 13th para: I think we need to add some context in this paragraph.  
Specifically, I suggest adding the following text t the penultimate sentence, 
``, but only for those external routers that have also deployed RPKI-based 
origin validation.''  And adding the following text to the final sentence, `` 
for just those RPs.''

- 15th para: I think it is important to be specific, and after claiming that 
something is ``more likely to be noticed,'' I think we ought to describe _how_ 
one might/should do so, in an operational setting.  As before, I'd suggest some 
advice be given through an example.

- 17th para: I think this paragraph makes good sense, but can we get a more 
quantitative discussion here?  I was just thinking that since this is an 
operational/engineering document, it might be good to shift this part from the 
qualitative end of the spectrum over more to the quantitative side.

- 20th para: This paragraph felt a little prescriptive from the 
policy/provisioning side, to me.  I caught myself wondering if this kind of 
advice really belongs here?  If it does, then maybe it would be more 
appropriate to just mention that proxy registration of this kind of data is an 
option, and cite how well that has worked elsewhere (like with IRRs and stuff)?

- 21st para: s/^While //
- 21st para: This paragraph suggest a period of ``four to six hours.''  I think 
we need some kind of explanation for these numbers.  As an operations document, 
it seems to me that we should be discussing tradeoffs and the relative value of 
different settings, etc.

Section 4:

- 2nd para: I worry that this advice is a little dilute, and (as a result) kind 
of falls a little limp (i.e., I was not able to clearly see what it was trying 
to explain).  I think if we had been carrying a strawman (or some strawmen) 
through the document, it would help bring the point of this paragraph into 
focus.

- 3rd para: In the same vein as the above, it seems like some examples/strawmen 
would be quite apropos.

Section 5: 

- 3rd para: ``10.0.666.0/24'' ?  maybe 10.0.6.0/24 ?

- 4th para: This paragraph seems to be offering tractable guidance.  Is there 
any thinking around the tradeoffs for when to change policy?

- 5th para: s/AS-path/AS_PATH/g

- 6th para: s/it's/its/

- 7th para: Should ``Local Pref'' be normalized to match earlier discussions of 
``Local-Preference''?

- 10th para: I think we need to add a little bit of text.  Perhaps add to the 
last sentence, `` for the same prefix''?

Section 6:

- 1st para: The comment/implication that incoherency is a quality of all 
distributed caching systems is totally untrue.  In fact, there are many cache 
protocols with different specific consistency models that accomplish this.  To 
claim something is not being attempted with RPKI is one thing.  To claim that 
no system is able to accomplish this is quite different.  Moreover, why is this 
(clearly a design issue with RPKI) being discussed in this draft (an 
operational guidance draft)?  This seems like it is definitely the wrong place 
to talk about this, but regardless, the text is quite wrong.

- 2nd para: I think this paragraph brings up an important point, but doesn't 
mention a very important operational side effect of that point.  I suggest 
adding one more sentence to the end, ``Alternately, since no consistency model 
is attempted, it is possible that routing may not be able to converge in those 
networks deploying this approach without manual intervention.''

- 3rd para + 4th para: I don't understand how this paragraph is conveying 
helpful operational advice?

- 10th para: Why was 1 hour chosen?  What are the tradeoffs, etc?

Section 7:

s/AS-Path/AS_PATH/g

Thanks,

Eric

On Aug 17, 2012, at 11:03 AM, Christopher Morrow wrote:

> Hello WG folk,
> This draft has undergone 9 revisions since the last WGLC, which seemed
> to end with requests for changes by the authors.
> Can we now have a final-final-please-let's-progress WGLC for this
> draft now? Let's end the call: 08/31/2012 (Aug 31 2012).
> 
> Htmlized version available at:
> http://tools.ietf.org/html/draft-ietf-sidr-origin-ops-19
> 
> Abstract:
> "Deployment of RPKI-based BGP origin validation has many operational
>   considerations.  This document attempts to collect and present the
>   most critical.  It is expected to evolve as RPKI-based origin
>   validation is deployed and the dynamics are better understood."
> 
> Thanks!
> -Chris
> <co-chair-2-of-3>
> _______________________________________________
> sidr mailing list
> sidr@ietf.org
> https://www.ietf.org/mailman/listinfo/sidr

_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Reply via email to