On Thu, Jul 8, 2021 at 4:03 PM William Herrin wrote:
>
> On Thu, Jul 8, 2021 at 5:31 AM Saku Ytti wrote:
> > Network experiences gray failures all the time, and I almost never
> > care, unless a customer does.
>
> I would suggest that your customer does care, but as there is no
> simple test to
On Thu, Jul 8, 2021 at 5:04 PM William Herrin wrote:
>
> On Thu, Jul 8, 2021 at 5:31 AM Saku Ytti wrote:
> > Network experiences gray failures all the time, and I almost never
> > care, unless a customer does.
>
> Greetings,
>
> I would suggest that your customer does care, but as there is no
>
On Thu, 8 Jul 2021 at 22:10, Baldur Norddahl wrote:
> We had a line card that would drop any IPv6 packet with bit #65 in the
> destination address set to 1. Turns out that only a few hosts have this bit
> set to 1 in the address, so nobody noticed until some Debian mirrors started
> to become
On Fri, 9 Jul 2021 at 00:01, William Herrin wrote:
> I would suggest that your customer does care, but as there is no
Most don't. Somewhat recently we were dropping a non-trivial amount of
packets from a well-known book store due to DMAC failure. This was
unexpected, considering it was an L3 to
On Thu, Jul 8, 2021 at 5:31 AM Saku Ytti wrote:
> Network experiences gray failures all the time, and I almost never
> care, unless a customer does.
Greetings,
I would suggest that your customer does care, but as there is no
simple test to demonstrate gray failures, your customer rarely makes
We had a line card that would drop any IPv6 packet with bit #65 in the
destination address set to 1. Turns out that only a few hosts have this bit
set to 1 in the address, so nobody noticed until some debian mirrors
started to become unreachable. Also webbrowser are very good at switching
to IPv4
On Thu, Jul 8, 2021 at 8:32 AM Saku Ytti wrote:
>
> On Thu, 8 Jul 2021 at 15:00, Vanbever Laurent wrote:
>
> > Detecting whole-link and node failures is relatively easy nowadays (e.g.,
> > using BFD). But what about detecting gray failures that only affect a
> > *subset* of the traffic, e.g. a
On Thu, 8 Jul 2021 at 19:25, Lukas Tribus wrote:
> More generally speaking, single link overloads causing PL or even full
> blackholing affecting single links (and therefore in a load-balanced
> environment: specific tuples) is something that is very frustrating to
> troubleshoot and it
Hello,
there is a large eyeball ASN in Southern Europe, single homed to a Tier1
running under the same corporate umbrella, which for about a decade
suffered from periodic blackholing of specific src/dst tuples. The issue
occurred every 6 - 18 months, completely breaking specific production
On Thu, 8 Jul 2021 at 17:59, Vanbever Laurent wrote:
> Thanks for sharing! I guess this process working means the counters are
> "standard" / close enough across vendors to allow for comparisons?
Not at all I'm afraid, and not intended for user consumption so
generally not available via SNMP
> One method is collecting lookup exceptions. We scrape these:
>
> npu_triton_trapstats.py:command = "start shell sh command \"for
> fpc in $(cli -c 'show chassis fpc' | grep Online | awk '{print $1;}');
> do echo FPC$fpc; vty -c 'show cda trapstats' fpc$fpc; done\""
> ptx1k_trapstats.py:
Hi Jörg,
Thanks for sharing your gray failure! With a few years of lifespan, it might
well be the oldest gray failure ever monitored continuously :-) I'm pretty sure
you guys exhausted all options already but... did you check for micro-bursts
that may cause sudden buffer overflow? Or perhaps
>
> If there is a network which does not
> experience these, then it's likely due to lack of visibility rather
> than issues not existing.
>
This. Full stop.
I believe there are very few, if any, production networks in existence in
which have a 0% rate of drops or 'weird shit'.
Monitoring for
We have a similar gray issue, where switches in a virtual chassis
configuration with layer3-configuration seem to lose transit ICMP
messages like echo or echo-reply randomly. Once we estimated it around
0.00012% ( let alone variances, or errors in measuring ).
We noticed this when we replaced
On Thu, 8 Jul 2021 at 16:13, Vanbever Laurent wrote:
> Thanks for chiming in. That's also my feeling: a *lot* of gray failures
> routinely happen, a small percentage of which end up being really damaging
> (the ones hitting customer traffic, as you pointed out). For this small
> percentage
On 7/8/21 15:22, Vanbever Laurent wrote:
Did you folks manage to understand what was causing the gray issue in the first
place?
Nope, still chasing it. We suspect a FIB issue on a transit device, but
currently building a test to confirm.
Mark.
Uucp using tcp does work to overcome packet size problems but limited usage but
did work in the past
Col
> On 8 Jul 2021, at 14:59, Mark Tinka wrote:
>
> On 7/8/21 14:29, Saku Ytti wrote:
>
>> Network experiences gray failures all the time, and I almost never
>> care, unless a customer does. If there is a network which does not
>> experience these, then it's likely due to lack of visibility
> On 8 Jul 2021, at 14:29, Saku Ytti wrote:
>
> On Thu, 8 Jul 2021 at 15:00, Vanbever Laurent wrote:
>
>> Detecting whole-link and node failures is relatively easy nowadays (e.g.,
>> using BFD). But what about detecting gray failures that only affect a
>> *subset* of the traffic, e.g. a
On 7/8/21 14:29, Saku Ytti wrote:
Network experiences gray failures all the time, and I almost never
care, unless a customer does. If there is a network which does not
experience these, then it's likely due to lack of visibility rather
than issues not existing.
Fixing these can take months
On 7/8/21 14:29, Saku Ytti wrote:
Network experiences gray failures all the time, and I almost never
care, unless a customer does. If there is a network which does not
experience these, then it's likely due to lack of visibility rather
than issues not existing.
Fixing these can take months
On Thu, 8 Jul 2021 at 15:00, Vanbever Laurent wrote:
> Detecting whole-link and node failures is relatively easy nowadays (e.g.,
> using BFD). But what about detecting gray failures that only affect a
> *subset* of the traffic, e.g. a router randomly dropping 0.1% of the packets?
> Does your
22 matches
Mail list logo