On Wed, 13 Dec 2017, Jonathan Morton wrote:

Ten times average demand estimated at time of deployment, and struggling badly with peak demand a decade later, yes. And this is the transportation industry, where a decade is a *short* time - like less than a year in telecoms.

I've worked in ISPs since 1999 or so. I've been at startups and I've been at established ISPs.

It's kind of an S curve when it comes to traffic growth, when you're adding customers you can easily see 100%-300% growth per year (or more). Then after market becomes saturated growth comes from per-customer increased usage, and for the past 20 years or so, this has been in the neighbourhood of 20-30% per year.

Running a network that congests parts of the day, it's hard to tell what "Quality of Experience" your customers will have. I've heard of horror stories from the 90ties where a then large US ISP was running an OC3 (155 megabit/s) full most of the day. So someone said "oh, we need to upgrade this", and after a while, they did, to 2xOC3. Great, right? No, after that upgrade both OC3:s were completely congested. Ok, then upgrade to OC12 (622 megabit/s). After that upgrade, evidently that link was not congested a few hours of the day, and of course needed more upgrades.

So at the places I've been, I've advocated for planning rules that say that when the link is peaking at 5 minute averages of more than 50% of link capacity, then upgrade needs to be ordered. This 50% number can be larger if the link aggregates larger number of customers, because typically your "statistical overbooking" varies less the more customers participates.

These devices do not do per-flow anything. They might have 10G or 100G link to/from it with many many millions of flows, and it's all NPU forwarding. Typically they might do DIFFserv-based queueing and WRED to mitigate excessive buffering. Today, they typically don't even do ECN marking (which I have advocated for, but there is not much support from other ISPs in this mission).

Now, on the customer access line it's a completely different matter. Typically people build with BRAS or similar, where (tens of) thousands of customers might sit on a (very expensive) access card with hundreds of thousands of queues per NPU. This still leaves just a few queues per customer, unfortunately. So these do not do per-flow anything either. This is where PIE comes in, because these devices like these can do PIE in the NPU fairly easily because it's kind of like WRED.

So back to the capacity issue. Since these devices typically aren't good at assuring per-customer access to the shared medium (backbone links), it's easier to just make sure the backbone links are not regularily full. This doesn't mean you're going to have 10x capacity all the time, it probably means you're going to be bouncing between 25-70% utilization of your links (for the normal case, because you need spare capacity to handle events that increase traffic temporarily, plus handle loss of capacity in case of a link fault). The upgrade might be to add another link, or a higher tier speed interface, bringing down the utilization to typically half or quarter of what you had before.

--
Mikael Abrahamsson    email: swm...@swm.pp.se
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat

Reply via email to