Hi, As I mentioned in my first mail I am a big supporter of end to end path measurement - mainly have been focusing on hop by hop timestamping approach.
So my comments were not really to discourage in any way end to end path probing - it is extremely useful. They were just to see your point of view on alternative options just for the specific goal here - congestion detection. Will be watching how your proposal evolves with interest ! Cheers, R. On Thu, Feb 27, 2020 at 9:17 AM <ruediger.g...@telekom.de> wrote: > Hi Robert, > > > > regarding scalability, I hope the difference between our positions is just > whether it’s a router or a dedicated CPE. I don’t promote deploying 20k PCs > (I hope to promote a metric to replace them). I prefer the dedicated CPE, > but routers do as well. > > > > The telemetry threshold might be an option too, if congestion is to be > detected. In September last year, we had to replace a line card at a > production router, whose ingress port hardware was corrupted. There were no > drops, just random delay variations, which showed in our performance > measurement system. I wonder whether problems like that can be detected by > evaluating telemetry. > > Further, telemetry needs to work reliable if the router hardware is busy > dealing with heavy loads (i.e., telemetry must be sufficiently privileged). > External measurements on forwarding layer don’t rely on router internal > processing resources. > > > > Regards, > > > > Ruediger > > > > > > *Von:* Robert Raszuk <rob...@raszuk.net> > *Gesendet:* Mittwoch, 26. Februar 2020 14:19 > *An:* Geib, Rüdiger <ruediger.g...@telekom.de> > *Cc:* ippm-cha...@ietf.org; SPRING WG <spring@ietf.org>; i...@ietf.org > *Betreff:* Re: [spring] Monitoring metric to detect and locate congestion > > > > Hi, > > > > Two clarifications: > > > > [RG1] the measurements pass the routers on forwarding plane. > > > > Well if I have 20K CEs and would like to measure this end to end that > means it better run on a router ... I can not envision installing 20K new > PCs just for this. At min such end point it should run on a well designed > CPE as an LXC. > > > > [RG1] Up to now, the conditions in Deutsche Telekom’s backbone network > require a careful design of router probing interval and counters to be > read. > > > > Sure. I actually had in mind telemetry push model where you only get > notified to yr monitoring station when applied queue threshold is crossed.. > Very little mgmt traffic in the network limited to only information which > is important. > > > > I know some vendors resist to locally (at LC or local RE) apply filtering > to telemetry streaming but this is IMHO just sick approach. > > > > Thx, > > R. > > > > > > > > > > > > On Wed, Feb 26, 2020 at 12:01 PM <ruediger.g...@telekom.de> wrote: > > Hi Robert, > > > > Thanks, my replies in line marked [RG1] > > > > I have read your draft and presentation with interest as I am a > big supporter and in some lab trials of end to end network path probing. > > > > Few comments, observations, questions: > > > > You are essentially measuring and comparing delay across N paths > traversing known network topology (I love "network tomography" name !) > > > > [RG1] it’s telemetry with a constrained set up, but the term doesn’t > appear in the draft yet…that can be changed. > > > > ------ > > * First question - this will likely run on RE/RP and in some platforms > path between LC and RE/RP is completely deterministic and can take 10s or > 100s of ms locally in the router. So right here the proposal to compare > anything may not really work - unless the spec mandates that actually > timestamping is done in hardware on the receiving LC. Then CPU can process > it when it has cycles. > > > > [RG1] the measurements pass the routers on forwarding plane. High-end > routers add variable processing latencies. It’s on a level of double or > lower triple digit [us] on Deutsche Telekom backbone routers. If a > dedicated sender receiver system is used, timestamping may be optimized for > the purpose. > > ------ > > > > * Second question is that congestion usually has a very transient > character ... You would need to be super lucky to find any congestion in > normal network using test probes of any sort. If you have interfaces always > congested then just the queue depth time delta may not be visible in end to > end measurements. > > > > [RG1] The probing frequency depends on the characteristics of congestion > the operator wants to be aware of. Unplanned events may cause changes in > measurement delays lasting for minutes or longer (congestion or hardware > issues). The duration of a reliably detectable “event” correspond to the > measurement packet distance (I don’t intend to replace hello-exchanges or > BFD by the metric). > > ------ > > > > * Third - why not simply look at the queue counters at each node ? Queue > depth, queue history, min, avg, max on a per interface basis offer tons of > information readily available. Why would anyone need to inject loops of > probe packets in known network to detect this ? And in black box unknown > networks this is not going to work as you would not know the network > topology in the first place. Likewise link down/up is already reflected in > your syslog via BFD and IGP alarms. I really do not think you need end to > end protocol to tell you that. > > > > [RG1] Up to now, the conditions in Deutsche Telekom’s backbone network > require a careful design of router probing interval and counters to be > read. The proposed metric allows to capture persistent issues impacting > forwarding. It points out, where these likely occur. An operator may then > have a closer look at an interface/router to analyse what’s going on, using > the full arsenal of accessible information and tools. As unusual events > happen rarely, it may still be a fair question for which purpose linecard- > and central processing cycles of routers are consumed. > > ------- > > + Thanks for catching the nit below.. > > > > Regards, Ruediger > > > > s/nodes L100 and L200 one one/nodes L100 and L200 on one/ > > > > :) > > > > Many thx, > > R. > > > > On Wed, Feb 26, 2020 at 8:55 AM <ruediger.g...@telekom.de> wrote: > > Dear IPPM (and SPRING) participants, > > > > I’m solliciting interest in a new network monitoring metric which allows > to detect and locate congested interfaces. Important properties are > > - Same scalability as ICMP ping in the sense one measurement relation > required per monitored connection > - Adds detection and location of congested interfaces as compared to > ICMP ping (otherwise measured metrics are compatible with ICMP ping) > - Requires Segment Routing (which means, measurement on forwarding > layer, no other interaction with passed routers – in opposite to ICMP ping) > - Active measurement (may be deployed using a single sender&receiver > or separate sender and receiver, Segment Routing allows for both options) > > > > I’d be happy to present the draft in Vancouver.. If there’s community > interest. Please read and comment. > > > > You’ll find slides at > > > > > https://datatracker.ietf.org/meeting/105/materials/slides-105-ippm-14-draft-geib-ippm-connectivity-monitoring-00 > > > > Draft url: > > > > https://datatracker.ietf.org/doc/draft-geib-ippm-connectivity-monitoring/ > > > > Regards, > > > > Ruediger > > _______________________________________________ > spring mailing list > spring@ietf.org > https://www.ietf.org/mailman/listinfo/spring > >
_______________________________________________ spring mailing list spring@ietf.org https://www.ietf.org/mailman/listinfo/spring