On Mon, Sep 12, 2011 at 2:28 PM, George, Wesley <wesley.geo...@twcable.com> wrote: > -----Original Message----- > From: christopher.mor...@gmail.com [mailto:christopher.mor...@gmail.com] On > Behalf Of Christopher Morrow > Sent: Sunday, September 11, 2011 11:26 PM > To: Randy Bush; George, Wesley > Cc: Russ White; sidr@ietf.org > Subject: Re: [sidr] BGPSec scaling (was RE: beacons and bgpsec) > > maybe what Wes is asking here is really: > "Could someone model the load on a router doing bgpsec, in a world of > bgpsec speaking devices?" > > Something like, for a core network edge device (say sprint, C&W, TWTC, > UU/vzb,ATT an edge connecting device in their worst metro): > o number of updates today/second (steady state and 'worst case') > o projected growth of update stream (given historical data) > o projected 'cost' (cpu cycles) of un-assisted bgpsec > o projected RIB RAM size (use historical data to project forward) > o projected beacons/second (which really just look like updates in > the update stream) > o routing table size (projected forward from historical data) > > It seems most of that data exists in one form or another, it seems > that running the math isn't "hard". There's a question of the validity > of the model... but that's always the case. > > Wes, is this sort of thing what you're asking for? > > WEG] Yes, to some extent, but you're right that the model is the hard part, > not the math. In trying to unwind a similar problem of how to characterize > steady-state and peak CPU load on a L3VPN PE router so that there are real > rules of thumb for capacity management and scaling, we discovered a couple of > things - > 1) (some) Vendors are quite bad at providing reasonably accurate > multi-dimensional scaling models based on testing or real-world results. They > tend to give a lot of single-dimension scale limits (eg with this knob turned > to 11, you can get this value), but are very conservative and mumbly when it > comes to what the actual real-life limits are, YMMV, etc. As a result, > sometimes you end up finding out about the scaling cliff as you're falling > over it, or you pay for hardware that you can never fully use because you > stick to very conservative limitations. > 2) a corollary: behavior at scale becomes increasingly non-deterministic the > more variables you're working with simultaneously. Even worse, it's difficult > to account in a model for things that work well enough at moderate scale, but > are not efficient enough for high scale, or suffer some sort of secondary > impact due to dependencies, etc. > 3) some routers are very bad at providing useful data about critical scaling > vectors (updates per sec, changes in multicast state, etc). Coupled with the > fact that each router's numbers can be wildly different, it's difficult to > characterize a "common" router, let alone a common network. > 4) there are widely varying opinions among vendors and operators as to what > is an acceptable level of performance at scale i.e. time to convergence of > last route, steady-state CPU utilization (how much headroom is enough), > stability during system or network events. > > I think that what is coming up here are concerns in a couple of different > categories: > 1) Short-term hardware scale - is BGPSec supportable with what is > realistically available today? For how long? Is that long enough? > 2) Long-term hardware scale (5+ years) - What's the next breakthrough? How > long does that buy us? Is that long enough? What does it do to our time > remaining before we have to redesign the routing system to make it keep > scaling? > This is where we should be considering RFC4984 and either updating or > affirming the guidance there. > 3) Cost for both - what is an acceptable assumption of the cost premium for > BGPSec, in both capital and personnel? > > On the hardware side, we're in a discussion that sounds a lot like predicting > peak oil - when do we run out of scale growth on Moore's law with the current > overall Internet architecture, and will BGPSec be just "one more gas-guzzler > on the road" or the straw that broke the camel's back? > > I don't know that we're going to get a definitive answer from modeling, and > I'm not trying to bring on analysis paralysis either. Randy's (and mine, and > everyone else's) guess may be BS, but even making a gut check based on what > info we have available and documenting the assumptions we're basing our > decision on would be a good thing.
I agree with the above, and the last comment really was what I was aiming at.. If someone were to model the 6-ish items I outlined, and properly documented their test-harness (and maybe provided it out so folk could test with their favorite settings?) that would help us get around this paralysis problem. At least we'd feel a bit more comfortable having something to check against. -chris _______________________________________________ sidr mailing list sidr@ietf.org https://www.ietf.org/mailman/listinfo/sidr