Re: [Bloat] [Cerowrt-devel] DC behaviors today

Mikael Abrahamsson Thu, 14 Dec 2017 00:23:07 -0800

On Wed, 13 Dec 2017, Jonathan Morton wrote:

Ten times average demand estimated at time of deployment, and strugglingbadly with peak demand a decade later, yes. And this is thetransportation industry, where a decade is a *short* time - like lessthan a year in telecoms.

I've worked in ISPs since 1999 or so. I've been at startups and I've beenat established ISPs.

It's kind of an S curve when it comes to traffic growth, when you'readding customers you can easily see 100%-300% growth per year (or more).Then after market becomes saturated growth comes from per-customerincreased usage, and for the past 20 years or so, this has been in theneighbourhood of 20-30% per year.

Running a network that congests parts of the day, it's hard to tell what"Quality of Experience" your customers will have. I've heard of horrorstories from the 90ties where a then large US ISP was running an OC3 (155megabit/s) full most of the day. So someone said "oh, we need to upgradethis", and after a while, they did, to 2xOC3. Great, right? No, after thatupgrade both OC3:s were completely congested. Ok, then upgrade to OC12(622 megabit/s). After that upgrade, evidently that link was not congesteda few hours of the day, and of course needed more upgrades.

So at the places I've been, I've advocated for planning rules that saythat when the link is peaking at 5 minute averages of more than 50% oflink capacity, then upgrade needs to be ordered. This 50% number can belarger if the link aggregates larger number of customers, becausetypically your "statistical overbooking" varies less the more customersparticipates.

These devices do not do per-flow anything. They might have 10G or 100Glink to/from it with many many millions of flows, and it's all NPUforwarding. Typically they might do DIFFserv-based queueing and WRED tomitigate excessive buffering. Today, they typically don't even do ECNmarking (which I have advocated for, but there is not much support fromother ISPs in this mission).

Now, on the customer access line it's a completely different matter.Typically people build with BRAS or similar, where (tens of) thousands ofcustomers might sit on a (very expensive) access card with hundreds ofthousands of queues per NPU. This still leaves just a few queues percustomer, unfortunately. So these do not do per-flow anything either. Thisis where PIE comes in, because these devices like these can do PIE in theNPU fairly easily because it's kind of like WRED.

So back to the capacity issue. Since these devices typically aren't goodat assuring per-customer access to the shared medium (backbone links),it's easier to just make sure the backbone links are not regularily full.This doesn't mean you're going to have 10x capacity all the time, itprobably means you're going to be bouncing between 25-70% utilization ofyour links (for the normal case, because you need spare capacity to handleevents that increase traffic temporarily, plus handle loss of capacity incase of a link fault). The upgrade might be to add another link, or ahigher tier speed interface, bringing down the utilization to typicallyhalf or quarter of what you had before.


--
Mikael Abrahamsson    email: swm...@swm.pp.se
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Re: [Bloat] [Cerowrt-devel] DC behaviors today

Reply via email to