Re: [sidr] sidr-arch-09 refresh cycle time

Matt Lepinski Tue, 27 Oct 2009 13:32:57 -0700

Geoff,

I'm happy to accept that the new wording is poor, but I'm pretty surethe old wording was also bad, and I think this discussion is important.

The old wording could easily be interpreted to suggest that once per daywas the correct frequency for pulling from a repository. (That is, Ibelieve the previous version was making a de facto recommendation for adefault behaivor of one pull every 24 hours ... there wasn't a RECOMMENDin the text, but we all know that examples tend to be normative in thistype of document.)

1) So the first implicit question is: Should the working group be makinga recommendation as to the frequency with which a relying party pullsfrom the repository?

Or equivalently: Is there a "wrong" frequency that people might use ifwe didn't give them any guidence?

It seems that retreiving updates "too frequently" (e.g., every 5minutes) strains the repository system and that retreiving updates "tooinfrequently" (e.g., monthly) means that when I inject a new ROA intothe system, it will take "unacceptably long" for this information topropogate to the relying parties that make use of this information.Therefore, we should have text in the document that articulates somemiddle ground that we believe is reasonable for the Internet. (I make noclaims that the current text in the document achieves this goal.)

2) The second question is: If we make a recommendation regardingfrequency with which relying parties should pull updates, what frequencyshould we recommend.

Here, I understand that "everyone hitting the repository system at once"is a bad outcome regardless of the frequency that we recommend. That is,regardless of whether we recommend "once per day", "once per month", or"eight times daily" we will likely see problems with too much serverload at midnight. If anyone can recommend text to avoid this phenomena(i.e., to encourage people to spread out their queries to the repositorysystem), please send text.

I agree that there are roughly 30,000 AS numbers visible in BGP, so it'sreasonable to assume on the order of 30,000 relying parties who will beroutinely querying the repository system. We might also assume that30,000 is a reasonable order of magnitude for the number of CAs in theRPKI (we might easily average 2 CAs per AS, but surely not 10 CAs per AS).

However, one thing that wasn't clear from reading your analysis was howmany CAs a given repository server would be hosting. If a server run bya large ISP or an RIR was providing a cache of all RPKI data, thenclients would have longer connections to this server (as they couldretrieve much of the data they need in one place), but they would beunlikely to receive requests from all 30,000 relying parties (e.g. anISP might provide a complete cache for their customers but fornon-customers they would typically only serve data for which they areauthoritative). Alternatively, if a server is only serving data for asmall subset of the CAs in the RPKI, then it might receive requests fromall relying parties, but those sessions would tend to be short(especially when nothing has changed).

In any case, I believe the way forward (with regards to server load) isto answer the question, "How many simultaneous connections arereasonable for a server that hosts publication points for X CAs?" andthen work backwards from there to determine if a given interval ofrelying party requests is reasonable from the server standpoint. I admitthat I haven't completely thought through re-key, but I'll try to dig upsome rough connection-time numbers based on our relying party software,and do a few back-of-the envolope computations.

With regards to client load, I'm not convinced that there's any problemwith frequent queries to the repository system. If the relying partyqueries a publication point and rsync determines that nothing haschanged, then no changes are required to ethe relying party's localcache and no cryptographic calculations are required. If something haschanged, then the relying party has to perform validation (whichincludes cryptographic signature verification) on the manifest and anynew objects that have been added. (Additionally, there may be resultingchanges to the client's local cache ... e.g., if a new CRL revokes apreviously valid certificate ... but such changes don't require newcryptographic computations, and so I believe the bottleneck is going tobe the one or two signature verifications per object changed [1]). Nowthe point from the relying party side is that if 5,000 manifests changeand 10,000 signed objects are added to the repository system on a givenday, then the relying party needs to do roughly 30,000 signatureverifications regardless of whether it learns of all these changes atonce, or whether it learns of them in small batches throughout thecourse of the day. Therefore, I don't see how making frequent checks fornew data has a significant impact on the relying party's processing load.

Finally, in addition to server and relying party processing loads, onemust also look at the benefit of frequent repository fetches. Keep inmind, that a relying party has no way of distinguishing the followingtwo events: (A) a route advertisement is originated by an AS that isauthorized to advertise the route, but the relying party hasn't fetchedrecently enough to obtain the new ROA; and (B) a route advertisement isoriginated by an unauthorized entity that is attempting to hijackaddress space. In this discussion, it is also important to note thatmanifests can gaurantee that the relying party received all signedobjects that existed at the moment that the manifest was published(i.e., a manifest can detect malicious deletion of data from arepository or corruption of data in transit) but the manifest saysnothing about data that may have been added since the manifest wasissued. This is why there is benefit in a relying party going back tothe publication point perioidically to see whether a new manifest hasbeen issued.

In any case, it's good to know that we'll have plenty to talk about inHiroshima.


- Matt Lepinski


Geoff Huston wrote:

WG Co-Chair Hat OFF

Hi Matt,
entities who are actually using RPKI data for routing SHOULD befetching fresh data from the repositories at least once every threehours.
3 hours?

At a first pass that seems very frequent.
From a server's perspective if there are 30,000 AS's out there andeach is running a local cache and each is a distinct relying party ofthe RPKI system, then the local hit rate at the server would be 3 persecond, assuming that all the relying parties evenly spread theirload (which is a pretty wild assumption - the worst case is that all30,000 attempt to resync at the 3 hour clock chime point) Assumingthat a repository sweep with no updates takes 30 seconds to completethen the server would have an average load of some 90 concurrent syncsessions. If there is a local rekey then the refresh would also implya reload of all the signed products at this repository publicationpoint. Assuming that this would then take 3 minutes to download, thenthe rekey load per server would be of the order of 540 concurrentrsync sessions as an average load. These load numbers appear to me tobe somewhat large.
From the relying party's perspective if there are 30,000 distinctRPKI repository publication points, and a serial form of localsynchronisation using a top-down tree walk then the same set ofassumptions imply that the relying party's perspective then it needsto process the synchronisation with the remote cache (includingminimally the manifest crypto calculation at a rate of 3 per second.Assuming that there are 200,000 distinct ROAs out there that are re-validated at each fetch then once more the numbers imply that a 3hour refresh would infer that the relying party would need tovalidate 200,000 ROAS in 10,800 seconds. That probably needs somepretty quick hardware.
These numbers are pretty much a toss at a dart board, and the draft'sauthors' may well be using a different scale model to justify thisrecommended time cycle. What numbers did you have in mind Matt thatwould make this "SHOULD" 3 hour refresh cycle feasible in a big-IInternet scenario of universal use?
Geoff

WG Co-Chair hat off



_______________________________________________
sidr mailing list
sidr@ietf.org
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] sidr-arch-09 refresh cycle time

Reply via email to