Hi,

> On 25 Jun 2016, at 09:32, Randy Bush <ra...@psg.com> wrote:
> 
>> Look, if no one can summon the energy to respond, Tim has no way to
>> decide on a change.
> 
> i believe that rob laid this out clearly many months ago.  and no, i
> will not look it up for folk; the epicycles have become too painful.

I remember the comments, and I have been in contact with Rob and the other RRDP 
authors about this. I think we can move forward.

It is useful to separate the issues regarding 1) https certificate 
verification, and 2) mft/crl re-issuance.

On 1) .. 

In short, we seem to converge on using HTTPS certificate validation to alert 
about possible problems, but since RPKI objects are signed and can be validated 
even if the source is untrusted, let's always try to get the latest regardless. 
We are still working on the text, but the gist of what we would like to put in 
a -05 doc before the cut-off date:

4.  HTTPS considerations

  It is RECOMMENDED that Relying Parties and Publication Servers follow
  the Best Current Practices outlined in [RFC7525] on the use of HTTP
  over TLS (https).

  Note that a Man-in-the-Middle (MITM) cannot produce validly signed
  RPKI data, but they can perform withhold or replay attacks targeting
  an RP, and keep the RP from learning about changes in the RPKI.
  Because of this RPs SHOULD do TLS certificate and host name
  validation when they fetch from an RRDP Publication Server

  However, such validation issues are often due to configuration
  errors, or a lack of a common TLS trust anchor.  In these cases it
  would be better that the RP retrieves the signed RPKI data
  regardless, and performs validation on it.

  Therefore RPs SHOULD log any TLS certificate or host name validation
  issues they find, so that an operator can investigate the cause.  But
  the RP SHOULD continue to retrieve the data.  The RP MAY choose to
  log this issue only when fetching the notification update file, but
  not when it subsequently fetches snapshot or delta files from the
  same host.  Furthermore the RP MAY provide a way for operators to
  accept untrusted connections for a given host, after the cause has
  been identified.


On 2) CRL/MFT re-issuance

First of all. TL;DR on the below: There are operational considerations that I 
am happy to share with the group, but if they need to be documented it's not in 
the RRDP document.

I brought it up because there is some mention of using nextUpdate in CRLs and 
MFTs as a protection against replays, and I believe the above (under 1) 
provides a better way to detect this. The 24 hours that is now frequently used 
is probably way too long anyway to be useful w.r.t. MITM. So I don't think that 
changing it to 7 days, or even 1 month makes much difference in this regard.

The discussion that we may want to have is whether nextUpdate should be used as 
an indication of when to fetch data again. This keeps coming up. Steve Kent 
also suggested having a long default nextUpdate time, and a shorter one when we 
know that there are changes.

Problem is that the common case for change is unpredictable: a change in 
routing requires ROAs or BGPSec certificates to be re-issued, and there is a 
desire that RPs learn about this fast. There was a lot of discussion a few 
years ago about how fast.. especially Danny McPherson was vocal on this. There 
is no clear indication of what is fast enough though. My impression is that if 
changes in RPKI can propagate to RPs (and connected routers) in 10-30 minutes 
we are in a good spot.

Both rcynic and the RIPE NCC RPKI Validator* will re-fetch at regular intervals 
regardless of the nextUpdate time. With RRDP I believe we have the scaling to 
support refetching every 5-10 mins by any RP that wants it. But if we do, we 
need to lower the churn resulting from MFT/CRL updates.

So, I believe that it's safe to lower the nextUpdate frequency to something in 
the order of a week, or even a month. Chris Morrow brought up a concern about 
keeping the cogs in the machine well greased. I hear you, but in our case we 
have over 3000 hosted CAs, if we re-issue MFTs/CRLs every month we still smear 
the cogs with a 100 CAs every day.

Finally I also had another thought how we can lower the signal-to-noise ratio 
of MFT/CRLs vs ROAs in our operations. Currently we re-issue CRLs/MFTs for our 
3000+ CAs every X hours, or whenever there is a change in ROAs (no BGPSec yet). 
We optimised to spread the load on our CPU and HSM. But if we want to optimise 
for RPs fetching instead, we can change our implementation to do the background 
CRLs/MFTs updates in large batches (say once per X days), and only do the ROA 
related updates as soon as they happen. This would allow RPs to aggressively do 
a cheap fetch for our update notification file every X minutes, and they would 
only find that they need to do an expensive when there are important changes - 
and okay once per X days because of the nextUpdates are re-newed.

Anyway, I believe that all of the above is in the space of local operational 
considerations. There may be merit in discussing, and we may find there is 
merit in documenting as Informational or BCP, but in my opinion not in the 
current RRDP document.


Cheers
Tim





*: off-topic.. yes, we need a cool name, ideas welcome ;)




> 
> randy
> 
> _______________________________________________
> sidr mailing list
> sidr@ietf.org
> https://www.ietf.org/mailman/listinfo/sidr

_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Reply via email to