Jeff, *
Sorry, i forgot about this thread because of christmas, just remembering it now
;-)
I think your observations are all spot on, but the causality that is being
implied is not correct.
I will claim that we will have a very hard time to make PIM be any faster
without TCP
than with TCP. Even getting close to the performance of TCP will be a lot of
work, replicating work that was done so many times in before in TCP, or worse
yet
coming up with PIM specific optimizations (not enough of a market to invest a
lot in that).
Sure we can and should also spend some some cycles to ask TCP experts which TCP
feature/profile/CC
we would recommend for use with PORT, but thats but a fraction of the work we
would
need otherwise if we wanted to re-invent the wheel or figure out which non-TCP
alternative reliability sublayer we wanted to recommend/use for PIM.
I remember soemthing like 10 or more than 10 years ago to have had the
discussion
about TCP in routers, and indeed, earlier implementations where very bad and
often
sucked up CPU, but even back then, the current TCP was a minor consumer of CPU
compared to the actual BGP work of best path calculation and dealing with
state.
What can be a typical issue of non-ideal TCP implementations is the incast
issue,
when you have a single node (BGP/PIM whatever) that simulataneously receives
traffic from multiple senders. And i am sure that not all TCP CC options will
perform equally well. But i am quite sure that all of them perform better than
datatram PIM where we simply get packet loss because of queue overrun on
this receiving PIM router - and not forwarding traffic for 60 seconds or
continuing to send unwanted traffic.
The other part of course is that it's easy to misimplement how the application
(PIM/BGP) ineracts with TCP: If the application is not written so that it will
accept traffic from the TCP socket arbitrarily fast, then you easily run into
non-idel flow-control at TCP level because you exhaust the TCP socket buffer.
So at least you need to make the TCP socket buffer sufficiently larger than
the maximum amount of data you may get from your neighbors so that tcp flow
control can operate at its fastest. Of course, when you have a big LAN with
20 downstream routers sending you after a reconvergence PIM joins for the same
100,000 groups, that can be a good amount of buffer memory, but in todays big
routers IMHO not an issue anymore. And that will also ensure that even if the
TCP implementations used are not idal for incast, that at least they will
do retransmissions as faest as possible, because they never need to wait for the
app (PIM). Good incast-friendly TCP implementations would of course share
buffer across sockets and not require memory based on how many TCP connections
you have, but only based on what your aggregate incast bandwidth is.
There are of course optimizations that can be done within PIM itself
to save even more memory, but i really don't want to start explaining those
details unless i am really persuaded its necesssary. Which i am totally not
right now.
So, in summary: I'd rather go for PORT which gives me reliability instead of
random burst
join loss caused issues, even if vendors then may take one or two releases to
optimize
convergence speed under high load and/or lots of downstream routers! Its just
going
to be soo much work reduction for years to come than tinkering on a PIM-datagram
specific solution.
Cheers
Toerless
P.S.: The one past datapoint of interest you did not mention:
When we designed mLDP as part of MPLS for multicast in the 200x, we initially
looked at Dino's 1998 implementation of MPLS for PIM, which actually was
released
to customers but only in software routers of course, so very few people had
actually looked at it, and decided against it and for LDP also because to a
large
extend because of TCP.
[ Of course now the use of LDP in mLDP hurts given how customers want to get rid
of LDP and think that mLDP must also go because it's LDP. If you jump on
a protocol as a buzzword train you trive and perish with it i guess, but
back then if it would have been just PIM over TCP with MPLS labels, it would
have been less well accepted *sigh* ;-)]
On Sat, Dec 17, 2022 at 02:32:36PM +0000, Jeffrey (Zhaohui) Zhang wrote:
> Hi Toerless,
>
> Some late comments - first specifically on the PIM topic and then extend to
> the general point of congestion aware routing protocols.
>
> The TCP-based PIM protocol RFC6559 was designed to handle the
> congestion-on-scale problem. However, most PIM deployments have not come to
> the point where scaling become a acute problem where RFC6559 solution must be
> used, so its deployment has been limited.
>
> The congestion-on-scale point was also taken when BGP-MVPN (RFC 6514) was
> developed. The Rosen/PIM-MVPN was very popular and there was a big debate
> when BGP-MVPN was proposed. Good that it eventually got standardized and
> became mainstream (at least for new deployments).
>
> Someone already brought up a point of BGP updates being potentially slow.
> I've also heard about that many times (sometimes from known BGP experts),
> including when I work on BGP based multicast (beyond RFC 6559).
>
> However, there are also protocols that rely on fast convergence even though
> they use BGP. EVPN is one example.
>
> Then mobile network's control plane relies on UDP-based GTP-C. I wonder why
> they're not concerned with congestion in scaled situations.
>
> For some 5G use cases I was proposing to use BGP to propagate routing
> information in place of some mobile user session information, and I often get
> asked "can you do that very fast"?
>
> So, I am struggling with these two things:
>
> - TCP-based solutions reduce protocol messages, but BGP may be deemed slow
> (or should I say with uncontrolled delay), though BGP-based EVPN actually
> relies on fast exchange of (at least some) BGP routes (e.g., for DF election).
> - Other solutions may lead to lots of protocol messages including refreshes,
> but the mobile operators seem to have been fine with UDP-based control plane.
>
> As for the "a totally non-congestion aware sending of protocol packets should
> not be permitted anymore for new RFC IMHO and i am just baffled how this is
> permitted anymore by the IETF. Where is adult supervision by TSV when we need
> it" comment below, I have the following view:
>
> - I am not sure if this involves TSV. A protocol sending lots of protocol
> packets is no different from an application sending lots of application
> traffic as far as transport is concerned. It is ultimately an issue with
> protocol design itself.
> - There are situations where a non-TCP based solution is needed even when a
> parallel TCP-based option is also present, so we can not simply disallow the
> former. We can discuss examples separately (one example is actually PIM as
> BIER overlay vs mLDP/BGP as BIER overlay).
>
> Thanks.
>
> Jeffrey
>
>
> Juniper Business Use Only
>
> -----Original Message-----
> From: pim <[email protected]> On Behalf Of Toerless Eckert
> Sent: Friday, December 9, 2022 8:47 AM
> To: Jon Crowcroft <[email protected]>
> Cc: BIER WG <[email protected]>; [email protected]; Matt Mathis
> <[email protected]>; [email protected]; Stewart Bryant
> <[email protected]>; pim <[email protected]>
> Subject: Re: [pim] Q on the congestion awareness of routing protocols
>
> [External Email. Be cautious of content]
>
>
> On Tue, Dec 06, 2022 at 07:15:31AM +0000, Jon Crowcroft wrote:
> > path exploration? but consider the shadow pricing...
> >
> > the tradeoff between convergence rate and congestion control seems to
> > be something that ought to be put on a more systematic grounding
>
> You folks are all thinking way beyond the point i was making and looking for
> support:
>
> In PIM, we have potentially gigantic burst of datagrams without any
> specification of pacing sent to routers across a network core (with easily
> likelyhood of path congestion). Such a totally non-congestion aware sending
> of protocol packets should not be permitted anymore for new RFC IMHO and i am
> just baffled how this is permitted anymore by the IETF. Where is adult
> supervision by TSV when we need it ;-)
>
> Yes, the incast issue is an interesting aspect, but i have not seen good
> simulations whether / to-what-extend it would happen in the PIM/BGP cases,
> but i would bet any sum, that a TCP solution, as bad as it may be will
> outperform the no-congestion-control periodic burst solution of (datagram)
> PIM.
>
> Cheers
> Toerless
>
> _______________________________________________
> pim mailing list
> [email protected]
> https://urldefense.com/v3/__https://www.ietf.org/mailman/listinfo/pim__;!!NEt6yMaO-gk!ASlubqGLmV8O43aB2Lcffy5JQ7FN49DnrotemtmPtVIat4Zubv-4DnJEjmh7o_4QoUn9BRIsoiEJuQ$
>
--
---
[email protected]