Re: [j-nsp] bfd = busted failure detection :)

2009-12-16 Thread Richard A Steenbergen
On Tue, Dec 15, 2009 at 11:03:08PM -0600, Kevin Day wrote: > > I went back and forth on this forever (pestering you while doing it), > because it was affecting us badly on old M20s. My "lab" boxes would > never ever show the problem, but it would happen in on the production > routers. I fina

Re: [j-nsp] bfd = busted failure detection :)

2009-12-15 Thread Kevin Day
On Dec 9, 2009, at 5:21 PM, Richard A Steenbergen wrote: The behavior we've always seen (from mid 7.x's until today) is that something seems to "block" the KRT queue while the pending changes keep piling up, then eventually whatever is causing the blockage clears and all the routes quickly i

Re: [j-nsp] bfd = busted failure detection :)

2009-12-15 Thread Judah Scott
In lab tests, RSVP success during ISSU was hit-or-miss. There were several cases that is did work but It seemed to be not entirely based on configuration (some degree of random failure). AFAIK it still hasn't been officially listed as supported in release notes or upgrade guides. Am I missing so

Re: [j-nsp] bfd = busted failure detection :)

2009-12-15 Thread Richard A Steenbergen
On Tue, Dec 15, 2009 at 04:53:44PM +0800, Mark Tinka wrote: > We've always been in favour of the NSR concept since > inception, but the reason we didn't choose it at the time > was because of limited protocol support (early days of JUNOS > 9.x). Also, only a handful of boxes on the Cisco side >

Re: [j-nsp] bfd = busted failure detection :)

2009-12-15 Thread Mark Tinka
On Tuesday 15 December 2009 04:16:23 pm Richard A Steenbergen wrote: > As for GR vs NSR, we're actually in the process of > turning GR off in favor of NSR. So far, in very limited > tests mind you, ISSU has actually worked for us without > anything exploding or catching on fire (surprising I >

Re: [j-nsp] bfd = busted failure detection :)

2009-12-15 Thread Richard A Steenbergen
On Tue, Dec 15, 2009 at 02:59:04PM +0800, Mark Tinka wrote: > On Monday 14 December 2009 05:23:45 pm Richard A Steenbergen > wrote: > > > Oh what good timing, just had to reboot a router tonight > > to recover from a differnet Juniper bug (enabling > > graceful-switchover on a 9.5R3 box caused

Re: [j-nsp] bfd = busted failure detection :)

2009-12-14 Thread Mark Tinka
On Monday 14 December 2009 05:23:45 pm Richard A Steenbergen wrote: > Oh what good timing, just had to reboot a router tonight > to recover from a differnet Juniper bug (enabling > graceful-switchover on a 9.5R3 box caused blackholing of > traffic, disabling it didn't fix it, had to reboot the

Re: [j-nsp] bfd = busted failure detection :)

2009-12-14 Thread Hoogen
Thanks for all the great info Richard... -Hoogen On Mon, Dec 14, 2009 at 1:23 AM, Richard A Steenbergen wrote: > On Sun, Dec 13, 2009 at 03:11:29AM -0600, Richard A Steenbergen wrote: > > That one is pretty different from the usual slowness issue that seems to > > be affecting most people. I jus

Re: [j-nsp] bfd = busted failure detection :)

2009-12-14 Thread Richard A Steenbergen
On Sun, Dec 13, 2009 at 03:11:29AM -0600, Richard A Steenbergen wrote: > That one is pretty different from the usual slowness issue that seems to > be affecting most people. I just cleared bgp sessions on a router to > demonstrate the issue, which you can portions of any time you make a > major rou

Re: [j-nsp] bfd = busted failure detection :)

2009-12-13 Thread Richard A Steenbergen
On Fri, Dec 11, 2009 at 02:50:51PM -0500, Ross Vandegrift wrote: > On Wed, Dec 09, 2009 at 05:21:21PM -0600, Richard A Steenbergen wrote: > > I've personally never had any luck reproducing it in the lab, so I > > understand Juniper's frustration. It seems to require a complexity of > > routes, port

Re: [j-nsp] bfd = busted failure detection :)

2009-12-11 Thread Ross Vandegrift
On Wed, Dec 09, 2009 at 05:21:21PM -0600, Richard A Steenbergen wrote: > I've personally never had any luck reproducing it in the lab, so I > understand Juniper's frustration. It seems to require a complexity of > routes, ports, and/or protocols which we simply don't have the time or > money to rep

Re: [j-nsp] bfd = busted failure detection :)

2009-12-09 Thread Richard A Steenbergen
On Wed, Dec 09, 2009 at 09:13:28AM -0700, David Ball wrote: > Do your KRT queues eventually flush though? Is it just a slow > control->fwding thing when large route updates occur? I've done 2 > upgrades in as many years to resolve a KRT related bug, but that > resulted in the queue NEVER emptying

Re: [j-nsp] bfd = busted failure detection :)

2009-12-09 Thread David Ball
Do your KRT queues eventually flush though? Is it just a slow control->fwding thing when large route updates occur? I've done 2 upgrades in as many years to resolve a KRT related bug, but that resulted in the queue NEVER emptying. It's apparently related to a residual variable being set after

Re: [j-nsp] bfd = busted failure detection :)

2009-12-09 Thread Mark Tinka
On Wednesday 09 December 2009 05:40:00 pm Richard A Steenbergen wrote: > Nothing current, it ended up spread out over a bunch of > cases which all got closed with no real resolution to > the root problem. Everyone I've talked to at Juniper > knows that the problem exists, they just don't know

Re: [j-nsp] bfd = busted failure detection :)

2009-12-09 Thread Richard A Steenbergen
On Wed, Dec 09, 2009 at 11:04:59AM +0200, Pekka Savola wrote: > On Wed, 9 Dec 2009, Richard A Steenbergen wrote: > >Oh and btw I take back all the nice things I said about progress being > >made to resolve the slow fib install / krt queue blocking bug. It is > >still alive and well in 9.5R3, and bl

Re: [j-nsp] bfd = busted failure detection :)

2009-12-09 Thread Pekka Savola
On Wed, 9 Dec 2009, Richard A Steenbergen wrote: Oh and btw I take back all the nice things I said about progress being made to resolve the slow fib install / krt queue blocking bug. It is still alive and well in 9.5R3, and blackholing traffic more than ever (just recorded a good 10 minutes of it

Re: [j-nsp] bfd = busted failure detection :)

2009-12-09 Thread Richard A Steenbergen
On Wed, Dec 09, 2009 at 12:27:55AM -0600, Richard A Steenbergen wrote: > Maybe, but I'm not sufficiently motivated to try and explain the issue > to JTAC to find out. :) FWIW we've now upgraded 3 of the MX960s that had > the issue from 9.2R4->9.5R3 and it resolved things completely. 9.2R4 is > a gi

Re: [j-nsp] bfd = busted failure detection :)

2009-12-08 Thread Richard A Steenbergen
On Tue, Dec 08, 2009 at 07:54:49PM -0500, Ross Vandegrift wrote: > On Fri, Dec 04, 2009 at 02:40:14PM -0600, Richard A Steenbergen wrote: > > FYI I found the root problem and hereby take back any comments impugning > > BFD's reputation. It turns out there actually WAS some kind of pfe bug > > which

Re: [j-nsp] bfd = busted failure detection :)

2009-12-08 Thread Ross Vandegrift
On Fri, Dec 04, 2009 at 02:40:14PM -0600, Richard A Steenbergen wrote: > FYI I found the root problem and hereby take back any comments impugning > BFD's reputation. It turns out there actually WAS some kind of pfe bug > which was causing intermittent blackholing of traffic for a few seconds > at a

Re: [j-nsp] bfd = busted failure detection :)

2009-12-04 Thread Richard A Steenbergen
On Sat, Nov 21, 2009 at 05:16:57PM -0600, Richard A Steenbergen wrote: > On Sat, Nov 21, 2009 at 12:53:58PM -0800, Nilesh Khambal wrote: > > Hi Richard, > > > > Just talking from this router perspective, it looks like the remote > > end router has problem receiving BFD packets from this router. It

Re: [j-nsp] bfd = busted failure detection :)

2009-11-21 Thread Richard A Steenbergen
On Sat, Nov 21, 2009 at 12:53:58PM -0800, Nilesh Khambal wrote: > Hi Richard, > > Just talking from this router perspective, it looks like the remote > end router has problem receiving BFD packets from this router. It > signaled the BFD session down because of that. There are actually two particu

Re: [j-nsp] bfd = busted failure detection :)

2009-11-21 Thread Nilesh Khambal
[Hit send accidently before completing the email] You can narrow down the pfe stats per FPC using the "fpc" knob in the "show pfe statistics traffic" output. At the remote end, you can look for any input errors (framing, CRC etcs) at the interface level. Then look for any drops at the route looku

Re: [j-nsp] bfd = busted failure detection :)

2009-11-21 Thread Nilesh Khambal
Hi Richard, Just talking from this router perspective, it looks like the remote end router has problem receiving BFD packets from this router. It signaled the BFD session down because of that. You can start by looking at egress stats at the on the local router. See if there are any ttp queue drop

[j-nsp] bfd = busted failure detection :)

2009-11-21 Thread Richard A Steenbergen
Is there a way to see stats on bfd sessions, such as the number of probes lost? I'm trying to figure out why I can't reliably keep bfd running between two Juniper's without getting a ton of false positives, even with very high detection thresholds. Nothing useful in "show bfd session extensive" (be