Gonna say, ironically, one early use of multicast was a proposal to use SRM instead of a mesh of tcp connections for iBGP...so some people do think about scaling control plane traffic in the presence of congestion, some times:-)
On Fri, 2 Dec 2022, 17:03 Toerless Eckert, <[email protected]> wrote: > Dear routing-discussion / TSV folks > (sorry for escalating this, but it really bugs me - Cc'ing PIM/BIER) > > What are these days the expectations against let's say a full Internet > Standard > for a routing protocol to support in terms of congestion safe behavior ? > And > what are congestion control expectation for new routing protocl RFCs even > if > just proposed standard ? > > I am asking, because i think that our core IP multicast routing protocol > fails miserably on this end, and quite frankly i do not understand how > PIM-SM (RFC7761) could have become a full Internet standard given how it > has zilch discussion about congestion or loss handling. > > [ Especially, when in comparison a protocol like RFC7450 where TSV did > raise concerns > about multicast data plane congestion awareness, and it was held up for > years, and > GregS as the WG-chair for the WG responsible for RFC7450 had to even help > co-author RFC8085 to cut through the congestion control concern-cord. > But likely > all for the better!]. > > To quickly summarize the issue with PIM-SM to those who do not know it: > > /- R2 -------- R6 -\ > Rcvrs ... R1 R7 ... Senders > \- R3 -- R4 -- R5 -/ > > CE ... PE .. P P P PE CE ... > > R1 has let's say 100,000 ulticast/PIM (S,G) states with sources behind R7, > so > it has to maintain 1000,000 so-called PIM (S,G) joins across the path R2, > R6, R7. > Lets say roughly an (S,G) join for IPv6 is about 38 byte (IPv6), maybe 35 > (S,G) > per 1500 byte packet, so 2857 packets of 1500 byte to carry all 100,000 > (S,G). > > Assume link R6/R7 fails, IGP reconverges, R1 recognizes that it needs to > change path, so it sends 2857 PIM-SM packets with prunes to R2 and 2857 > PIM -SM > packets with joins to R3. > > Assume R1 is a PE, R2 and R3 are P routers in an SP, and actually R2/R3 > connect > to lets say 100 routers like R1. Now R2 and R3 get 100 x 2857 1500 byte > packets. > > And there is nothing in the PIM-SM spec that talks about how to throttle > this > heap of PIM-SM packets. Typically, routers would just send them > back-to-back. > And those packets repeat every 60 seconds given how PIM-SM is datagram / > periodic > soft-state. In fact, if you try to scale this in production networks, you > will > most likely fail a lot more than IP multicast in those routers, because > PIM not > only will badly compete on control-plane CPU time, but even more so on > control-plane > to hardware-forwarding time when updating the 100,000 (S,G) hardware > forwarding entries. > > Correct me if i am wrong, but did the same type of issues in ISIS/OSPF in > DC because of so many parallel paths and hence duplication of LSA recently > lead to the creation of multiple IETF working groups in RTG to solve these > issues ? > > In IP multicast, we where well aware of these issues and they where a core > reason to not build a PIM-based MPLS multicast protocol, but use the TCP > based LDP > to specify mLDP (RFC6388). Same thing, when various BGP multicast work was > done as an alternative to PIM for SPs (BCP also being TCP based). > > We did even fix this problem in PIM by specifying RFC6559 (PIM over TCP), > but instead of making that mechanisms mandatory and become the only option > for PIM when moving PIM up the IETF standards ladder to RFC7761, that > RFC had seemingly fallen into ignorance in the IP Multicast community, > because most IP multicast deployments are small enough that these issues > do not occur. > > So, why do i escalate this issue now ? > > We have a great new multicast architecture called BIER that eliminates > all this PIM multicast state issues from the P routers of such large > service provider networks by being stateless. But it still leaves the > need for overlay signaling, such as with PIM to operate between the > PE, such as in above picture the hundreds if not thousands > of receiver PE R1' and sender PE R7'. In which case you would have > PIM directly between those R1'/R7' across multihop paths, leading > to even more congestion considerations. And in support of such BIER > networks, > there is a draft draft-hb-pim-light proposed to PIM-WG to optimize PIM > explicitly > for this type of deployment. And when i said in PIM@IETF115, that such a > draft IMHO > should only allowed to proceed when it is written to say it MUST > be based on PIM over TCP (RFC6388), all other people responding > on the thread said at best it could be be a MAY. Aka: Congestion control > optional. > > Am i a congestion control extremist ? I really only want to have > scaleable, reliably multicast RFCs, especially when they aspire and > go to full IETF standard and are meant to support our next-gen IP Multicast > architectures (BIER). I do fully understand how there is a lot > of cost pressure on vendor development, and having procrastinated > to implement, proliferate and deploy PIM over TCP so far (almost a decade!) > does make this a less attractive choice short term. And the whole purpose > of the PIM light draft of course is to reduce the amount of development > needed > by making PIM more "light" (which is a good think). But when it > carries forward the problems of PIM to another generation of networks > (using BIER) that was especially built to scale better, then one > should IMHO really become worried. At least i do. But i also struggled to > implement datagram PIM processing for 100,000 states in a prior life > and then pushed for PIM over TCP... > > Thanks! > Toerless > > _______________________________________________ > routing-discussion mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/routing-discussion >
