First, a nit: I don't think it is accurate to classify a control-plane
scaling failure as congestion control failure.

This does happen quite often in the field. In some very public cases,
resulting in a PIM network that never converges post link/node failure.
Non-affected traffic (unicast/BIER) continues to be forwarded without issue.

I agree that with PIM signalling for BIER PIM-over-TCP (RFC6388) would be a
good (better) fit than RFC7761

However, you left out some very relevant work in the BIER WG which has been
a part of the solution space since the beginning.

See https://datatracker.ietf.org/doc/draft-ietf-bier-idr-extensions/

BIER, being a forwarding plane architecture, is agnostic to the other
layers, allowing operators to pick their favorite/appropriate poison for
signaling as well as UFIB propagation.

https://datatracker.ietf.org/doc/draft-ietf-bier-pim-signaling/ should
include RFC6388as well as RFC776. Please take this up with the authors and
the BIER WG for input.

- Shep

On Fri, Dec 2, 2022 at 9:03 AM Toerless Eckert <[email protected]> wrote:

> Dear routing-discussion / TSV folks
> (sorry for escalating this, but it really bugs me - Cc'ing PIM/BIER)
>
> What are these days the expectations against let's say a full Internet
> Standard
> for a routing protocol to support in terms of congestion safe behavior ?
> And
> what are congestion control expectation for new routing protocl RFCs even
> if
> just proposed standard ?
>
> I am asking, because i think that our core IP multicast routing protocol
> fails miserably on this end, and quite frankly i do not understand how
> PIM-SM (RFC7761) could have become a full Internet standard given how it
> has zilch discussion about congestion or loss handling.
>
> [ Especially, when in comparison a protocol like RFC7450 where TSV did
> raise concerns
>   about multicast data plane congestion awareness, and it  was held up for
> years, and
>   GregS as the WG-chair for the WG responsible for RFC7450 had to even help
>   co-author RFC8085 to cut through the congestion control concern-cord.
> But likely
>   all for the better!].
>
> To quickly summarize the issue with PIM-SM to those who do not know it:
>
>                  /- R2 -------- R6 -\
>      Rcvrs ... R1                    R7 ... Senders
>                  \- R3 -- R4 -- R5 -/
>
>         CE ... PE .. P    P     P    PE  CE ...
>
> R1 has let's say 100,000 ulticast/PIM (S,G) states with sources behind R7,
> so
> it has to maintain 1000,000 so-called PIM (S,G) joins across the path R2,
> R6, R7.
> Lets say roughly an (S,G) join for IPv6 is about 38 byte (IPv6), maybe 35
> (S,G)
> per 1500 byte packet, so 2857 packets of 1500 byte to carry all 100,000
> (S,G).
>
> Assume link R6/R7 fails, IGP reconverges, R1 recognizes that it needs to
> change path, so it sends 2857 PIM-SM packets with prunes to R2 and 2857
> PIM -SM
> packets with joins to R3.
>
> Assume R1 is a PE, R2 and R3 are P routers in an SP, and actually R2/R3
> connect
> to lets say 100 routers like R1. Now R2 and R3 get 100 x 2857 1500 byte
> packets.
>
> And there is nothing in the PIM-SM spec that talks about how to throttle
> this
> heap of PIM-SM packets. Typically, routers would just send them
> back-to-back.
> And those packets repeat every 60 seconds given how PIM-SM is datagram /
> periodic
> soft-state.  In fact, if you try to scale this in production networks, you
> will
> most likely fail a lot more than IP multicast in those routers, because
> PIM not
> only will badly compete on control-plane CPU time, but even more so on
> control-plane
> to hardware-forwarding time when updating the 100,000 (S,G) hardware
> forwarding entries.
>
> Correct me if i am wrong, but did the same type of issues in ISIS/OSPF in
> DC because of so many parallel paths and hence duplication of LSA recently
> lead to the creation of multiple IETF working groups in RTG to solve these
> issues ?
>
> In IP multicast, we where well aware of these issues and they where a core
> reason to not build a PIM-based MPLS multicast protocol, but use the TCP
> based LDP
> to specify mLDP (RFC6388). Same thing, when various BGP multicast work was
> done as an alternative to PIM for SPs (BCP also being TCP based).
>
> We did even fix this problem in PIM by specifying RFC6559 (PIM over TCP),
> but instead of making that mechanisms mandatory and become the only option
> for PIM when moving PIM up the IETF standards ladder to RFC7761, that
> RFC had seemingly fallen into ignorance in the IP Multicast community,
> because most IP multicast deployments are small enough that these issues
> do not occur.
>
> So, why do i escalate this issue now ?
>
> We have a great new multicast architecture called BIER that eliminates
> all this PIM multicast state issues from the P routers of such large
> service provider networks by being stateless. But it still leaves the
> need for overlay signaling, such as with PIM to operate between the
> PE, such as in above picture the hundreds if not thousands
> of receiver PE R1' and sender PE R7'. In which case you would have
> PIM directly between those R1'/R7' across multihop paths, leading
> to even more congestion considerations. And in support of such BIER
> networks,
> there is a draft draft-hb-pim-light proposed to PIM-WG to optimize PIM
> explicitly
> for this type of deployment. And when i said in PIM@IETF115, that such a
> draft IMHO
> should only allowed to proceed when it is written to say it MUST
> be based on PIM over TCP (RFC6388), all other people responding
> on the thread said at best it could be be a MAY. Aka: Congestion control
> optional.
>
> Am i a congestion control extremist ? I really only want to have
> scaleable, reliably multicast RFCs, especially when they aspire and
> go to full IETF standard and are meant to support our next-gen IP Multicast
> architectures (BIER). I do fully understand how there is a lot
> of cost pressure on vendor development, and having procrastinated
> to implement, proliferate and deploy PIM over TCP so far (almost a decade!)
> does make this a less attractive choice short term. And the whole purpose
> of the PIM light draft of course is to reduce the amount of development
> needed
> by making PIM more "light" (which is a good think). But when it
> carries forward the problems of PIM to another generation of networks
> (using BIER) that was especially built to scale better, then one
> should IMHO really become worried. At least i do. But i also struggled to
> implement datagram PIM processing for 100,000 states in a prior life
> and then pushed for PIM over TCP...
>
> Thanks!
>     Toerless
>
> _______________________________________________
> pim mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/pim
>

Reply via email to