Re: [EXTERNAL] Re: Famous operational issues

2021-10-01 Thread Ray Bellis
On 16/02/2021 22:51, Compton, Rich A wrote: > There was the outage in 2014 when we got to 512K routes. > http://www.bgpmon.net/what-caused-todays-internet-hiccup/ There was a similar issue in 1998/9 or so when we got to 64K routes, which broke the routing table index (which defaulted to a uint1

Re: [nanog] Famous operational issues

2021-06-12 Thread Patrick Schultz
opening the link currently gives me a HTTP 500 error, very fitting :) Am 12.06.2021 um 04:42 schrieb Dan Mahoney: > I only just now found this thread, so I'm sorry I'm late to the party, but > here, I put it on Medium. > > https://gushi.medium.com/the-worst-day-ever-at-my-day-job-beff7f4170aa > >

Re: [nanog] Famous operational issues

2021-06-12 Thread Giuseppe De Luca
What a day.. hope you are better now :) On 6/12/2021 2:42 AM, Dan Mahoney wrote: I only just now found this thread, so I'm sorry I'm late to the party, but here, I put it on Medium. https://gushi.medium.com/the-worst-day-ever-at-my-day-job-beff7f4170aa

Re: [nanog] Famous operational issues

2021-06-11 Thread Dan Mahoney
I only just now found this thread, so I'm sorry I'm late to the party, but here, I put it on Medium. https://gushi.medium.com/the-worst-day-ever-at-my-day-job-beff7f4170aa > On Mar 12, 2021, at 10:07 PM, Mark Tinka wrote: > > Hardly famous and not service-affecting in the end, but figured I'd

Re: Famous operational issues

2021-03-12 Thread Mark Tinka
Hardly famous and not service-affecting in the end, but figured I'd share an incident from our side that occurred back in 2018. While commissioning a new node in our Metro-E network, an IPv6 point-to-point address was mis-typed. Instead of ending in /126, it ended in /12. This happened in Joha

Re: Famous operational issues

2021-02-24 Thread Randy Bush
anyone else have the privilege of running 2321 data cells? had a bunch. unreliable as hell. there was a job running continuously recovering transactions off of log tapes. one night at 3am, head of apps program (i was systems) got a call that a tran tape was unmounted with a console message that

Re: Famous operational issues

2021-02-24 Thread Alain Hebert
    I personally did "disable vlan Xyz" instead of "delete vlan Xyz" on Extreme Network... which proceeded to disable all the ports where the VLAN was present...     Good thing it was a (local) remote pop and not on the core. - Alain Hebertaheb...@pubnix.net

Re: Famous operational issues

2021-02-23 Thread Valdis Klētnieks
On Tue, 23 Feb 2021 20:46:38 -0800, Randy Bush said: > maybe late '60s or so, we had a few 2314 dasd monsters[0]. think maybe > 4m x 2m with 9 drives with removable disk packs. > > a grave shift operator gets errors on a drive and wonders if maybe they > swap it into another spindle. no luck, so

Re: Famous operational issues

2021-02-23 Thread Randy Bush
maybe late '60s or so, we had a few 2314 dasd monsters[0]. think maybe 4m x 2m with 9 drives with removable disk packs. a grave shift operator gets errors on a drive and wonders if maybe they swap it into another spindle. no luck, so swapped those two drives with two others. one more iteration,

Re: Famous operational issues

2021-02-23 Thread Adam Kennedy via NANOG
While we're talking about raid types... A few acquisitions ago, between 2006-2010, I worked at a Wireless ISP in Northern Indiana. Our CEO decided to sell Internet service to school systems because the e-rate funding was too much to resist. He had the idea to install towers on the schools and sell

Re: Famous operational issues

2021-02-23 Thread brutal8z via NANOG
My war story. At one of our major POPs in DC we had a row of 7513's, and one of them had intermittent problems. I had replaced every piece of removable card/part in it over time, and it kept failing. Even the vendor flew in a team to the site to try to figure out what was wrong. It was finally dec

Re: Famous operational issues

2021-02-23 Thread bzs
Anyone remember when DEC delivered a new VMS version (V5 I think) whose backups didn't work, couldn't be restored? BU did, the hard way, when the engineering dept's faculty and student disk failed. DEC actually paid thousands of dollars for typist services to come and re-enter whatever was on p

Re: Famous operational issues

2021-02-23 Thread scott
On 2/23/2021 12:22 PM, Justin Streiner wrote: An interesting sub-thread to this could be: Have you ever unintentionally crashed a device by running a perfectly innocuous command? --- There was that time in the later 1990s where I t

Re: Famous operational issues

2021-02-23 Thread Warren Kumari
On Tue, Feb 23, 2021 at 5:14 PM Justin Streiner wrote: > Beyond the widespread outages, I have so many personal war stories that > it's hard to pick a favorite. > > My first job out of college in the mid-late 90s was at an ISP in > Pittsburgh that I joined pretty early in its existence, and every

Re: Famous operational issues

2021-02-23 Thread Eric Kuhnke
I would be more interested in seeing someone who HASN'T crashed a Cisco 6500/7600, particularly one with a long uptime, by typing in a supposedly harmless 'show' command. On Tue, Feb 23, 2021 at 2:26 PM Justin Streiner wrote: > An interesting sub-thread to this could be: > > Have you ever unint

Re: Famous operational issues

2021-02-23 Thread Shawn L via NANOG
in the back of a pickup. Fun times. -Original Message- From: "Justin Streiner" Sent: Tuesday, February 23, 2021 5:11pm To: "John Kristoff" Cc: "NANOG" Subject: Re: Famous operational issues Beyond the widespread outages, I have so many personal war sto

Re: Famous operational issues

2021-02-23 Thread Justin Streiner
An interesting sub-thread to this could be: Have you ever unintentionally crashed a device by running a perfectly innocuous command? 1. Crashed a 6500/Sup2 by typing "show ip dhcp binding". 2. "clear interface XXX" on a Nexus 7K triggered a cascading/undocument Sev1 bug that caused two linecards t

Re: Famous operational issues

2021-02-23 Thread Justin Streiner
Beyond the widespread outages, I have so many personal war stories that it's hard to pick a favorite. My first job out of college in the mid-late 90s was at an ISP in Pittsburgh that I joined pretty early in its existence, and everyone did a bit of everything. I was hired to do sysadmin stuff, net

Re: Famous operational issues

2021-02-23 Thread Justin Streiner
On Thu, Feb 18, 2021 at 5:38 PM Warren Kumari wrote: > > 2: A somewhat similar thing would happen with the Ascend TNT Max, which > had side-to-side airflow. These were dial termination boxes, and so people > would install racks and racks of them. The first one would draw in cool air > on the left

Re: Famous operational issues

2021-02-23 Thread Warren Kumari
On Mon, Feb 22, 2021 at 7:31 PM wrote: > > At Boston Univ we discovered the hard way that a security guard's > walkie-talkie could cause a $5,000 (or $10K for the big machine room) > Halon dump. > At one of the AOL datacenters there was some convoluted fire marshal reason why a specific door cou

Re: Famous operational issues

2021-02-22 Thread sronan
Let me tell you about my personal favorite. It’s 2002 and I am working as an engineer for an electronic stock trading platform (ECN), this platform happened to be the biggest platform for trading stocks electronically, on some days bigger than NASDAQ itself. This platform also happened to be ru

Re: Famous operational issues

2021-02-22 Thread bzs
At Boston Univ we discovered the hard way that a security guard's walkie-talkie could cause a $5,000 (or $10K for the big machine room) Halon dump. Took a couple of times before we figured out the connection tho once someone made it to the hold button before it actually dumped. Speaking of halo

Re: Famous operational issues

2021-02-22 Thread Patrick W. Gilmore
On Feb 22, 2021, at 7:02 AM, t...@pelican.org wrote: > On Thursday, 18 February, 2021 22:37, "Warren Kumari" > said: > >> 4: Not too long after I started doing networking (and for the same small >> ISP in Yonkers), I'm flying off to install a new customer. I (of course) >> think that I'm hot stu

Re: Famous operational issues

2021-02-22 Thread Owen DeLong
> On Feb 18, 2021, at 9:04 PM, Jen Linkova wrote: > > On Fri, Feb 19, 2021 at 9:40 AM Warren Kumari wrote: >> 4: Not too long after I started doing networking (and for the same small ISP >> in Yonkers), I'm flying off to install a new customer. I (of course) think >> that I'm hot stuff becau

RE: Famous operational issues

2021-02-22 Thread Tony Wicks
Many years ago I experienced a very similar thing. The DC/Integrator I worked for outsourced the co-location and operation of mainframe services for several banks and government organisations. One of these banks had a significant investment in AS/400's and they decided that it was so much hassle

Re: Famous operational issues

2021-02-22 Thread Dovid Bender
On Mon, Feb 22, 2021 at 2:05 PM Warren Kumari wrote: > > > On Mon, Feb 22, 2021 at 12:50 PM Regis M. Donovan > wrote: > >> On Thu, Feb 18, 2021 at 07:34:39PM -0500, Patrick W. Gilmore wrote: >> > And to put it on topic, cover your EPOs >> >> I worked somewhere with an uncovered EPO, which was ok

Re: Famous operational issues

2021-02-22 Thread Jethro R Binks
On Fri, 19 Feb 2021, Andy Ringsmuth wrote: > > I explain using my "talking to a 5 year old" voice that it > > most certainly is a router. He tells me that lying to airport security > > is a federal offense, and starts looming at me. I adjust my attitude > > and start explaining that it's like a

Re: Famous operational issues

2021-02-22 Thread Warren Kumari
On Mon, Feb 22, 2021 at 12:50 PM Regis M. Donovan wrote: > On Thu, Feb 18, 2021 at 07:34:39PM -0500, Patrick W. Gilmore wrote: > > And to put it on topic, cover your EPOs > > I worked somewhere with an uncovered EPO, which was okay until we had a > telco tech in who was used to a different data c

Re: Famous operational issues

2021-02-22 Thread Regis M. Donovan
On Thu, Feb 18, 2021 at 07:34:39PM -0500, Patrick W. Gilmore wrote: > And to put it on topic, cover your EPOs I worked somewhere with an uncovered EPO, which was okay until we had a telco tech in who was used to a different data center where a similar looking button controlled the door access, so

Re: Famous operational issues

2021-02-22 Thread Christopher Morrow
Long ago, in a galaxy far away I worked for a gov't contractor on site at a gov't site... We had our own cute little datacenter, and our 4 building complex had a central power distribution setup from utility -> buildings. It was really quite nice :) (the job, the buildings, the power and cute litt

Re: Famous operational issues

2021-02-22 Thread Warren Kumari
On Mon, Feb 22, 2021 at 7:09 AM t...@pelican.org wrote: > On Thursday, 18 February, 2021 22:37, "Warren Kumari" > said: > > > 4: Not too long after I started doing networking (and for the same small > > ISP in Yonkers), I'm flying off to install a new customer. I (of course) > > think that I'm h

Re: Famous operational issues

2021-02-22 Thread Tony Finch
Patrick W. Gilmore wrote: > > Me: Did you order that EPO cover? > Her: Nope. There are apparently two kinds of EPO cover: - the kind that stops you from pressing the button by mistake; - and the kind that doesn't, and instead locks the button down to make sure it isn't un-pressed un

Re: Famous operational issues

2021-02-22 Thread Bruce H McIntosh
On 2/22/21 9:14 AM, Alain Hebert wrote: *[External Email]*     Well...     During my younger days, that button was used a few time by the operator of a VM/370 to regain control from someone with a "curious mind" *cought* *cought*... Two horror stories I remember from long ago when I wa

Re: Famous operational issues

2021-02-22 Thread Alain Hebert
    Well...     During my younger days, that button was used a few time by the operator of a VM/370 to regain control from someone with a "curious mind" *cought* *cought*... - Alain Hebertaheb...@pubnix.net PubNIX Inc. 50 boul. St-Charles P.O. Box 26770

Re: Famous operational issues

2021-02-22 Thread t...@pelican.org
On Thursday, 18 February, 2021 22:37, "Warren Kumari" said: > 4: Not too long after I started doing networking (and for the same small > ISP in Yonkers), I'm flying off to install a new customer. I (of course) > think that I'm hot stuff because I'm going to do the install, configure the > router,

Re: Famous operational issues

2021-02-21 Thread Ben Cannon
I’m embarrassed to say, I’ve done this. Ms. Lady Benjamin PD Cannon, ASCE 6x7 Networks & 6x7 Telecom, LLC CEO b...@6by7.net "The only fully end-to-end encrypted global telecommunications company in the world.” FCC License KJ6FJJ Sent from my iPhone via RFC1149. > On Feb 19, 2021, at 12:55 AM

Re: Famous operational issues

2021-02-20 Thread Jörg Kost
Oh, I actually wanted to keep this for my memoirs, but if we can name danger datacenter operational issues …. somehow 2000s: Somebody ran its own datacenter, - once had an active ant colony living under the raised floor and in the climate system, - for a while had several electric grounding d

Re: Famous operational issues

2021-02-20 Thread Clayton Zekelman
Not a famous operational issue, but in 2000, we had a major outage of our dialup modem pool. The owner of the building was re-skinning the outside using Styrofoam and stucco. A bunch of the Styrofoam had blocked the roof drains on the podium section of the building, immediately above our e

Re: Famous operational issues

2021-02-20 Thread Eric Kuhnke
>From a datacenter ROI and economics, cooling, HVAC perspective that might just be the best colo customer ever. As long as they're paying full price for the cabinet and nothing is *dangerous* about how they've hung the 2U server vertically, using up all that space for just one thing has to be a lot

Re: Famous operational issues

2021-02-20 Thread Henry Yen
On Thu, Feb 18, 2021 at 07:34:39AM -0500, Patrick W. Gilmore wrote: > In 1994, there was a major earthquake near the city of Los Angeles. City hall > had to be evacuated and it would take over a year to reinforce the building > to make it habitable again. My company moved all the systems in the b

Re: Famous operational issues

2021-02-19 Thread Tom Hill
On 16/02/2021 22:08, Jared Mauch wrote: > I was thinking about how we need a war stories nanog track. My favorite was > being on call when the router was stolen. Enough time has (probably) elapsed since my escapades in a small data centre in Manchester. The RFO was ten pages long, and I don't wa

Re: Famous operational issues

2021-02-19 Thread Sabri Berisha
- On Feb 19, 2021, at 3:07 AM, Daniel Karrenberg d...@ripe.net wrote: Hi, > Lessons: HW/SW mono-cultures are dangerous. Input testing is good > practice at all levels software. Operational co-ordination is key in > times of crisis. Well... Here is a very similar, fairly recent one. Albeit in

Re: Famous operational issues

2021-02-19 Thread Warren Kumari
At a previous company we had a large number of Foundry Networks layer-3 switches. They participated in our OSPF network and had a *really* annoying bug. Every now and then one of them would get somewhat confused and would corrupt its OSPF database (there seemed to be some pointer that would end up

Re: Famous operational issues

2021-02-19 Thread Andrew Gallo
On 2/16/2021 2:37 PM, John Kristoff wrote: Friends, I'd like to start a thread about the most famous and widespread Internet operational issues, outages or implementation incompatibilities you have seen. Which examples would make up your top three? I don't believe I've seen this in any of

Re: Famous operational issues

2021-02-19 Thread Andrey Kostin
Jen Linkova писал 2021-02-19 00:04: OK, Warren, achievement unlocked. You've just made a network engineer to google 'router' He meant that we call "frezer" machine... (in our language ;) I heard a similar story from my colleague who was working at that time for Huawei as DWDM engineer an

Re: Famous operational issues

2021-02-19 Thread Aaron C. de Bruyn via NANOG
All these stories remind me of two of my own from back in the late 90s. I worked for a regional ISP doing some network stuff (under the real engineer), and some software development. Like a lot of ISPs in the 90s, this one started out in a rental house. Over the months and years rooms were slowly

Re: Famous operational issues

2021-02-19 Thread Daniel Karrenberg
On 16 Feb 2021, at 20:37, John Kristoff wrote: I'd like to start a thread about the most famous and widespread Internet operational issues, outages or implementation incompatibilities you have seen. Which examples would make up your top three? My absolute top one happened 1995. Traffic e

Re: Famous operational issues

2021-02-19 Thread Jen Linkova
On Fri, Feb 19, 2021 at 9:40 AM Warren Kumari wrote: > 4: Not too long after I started doing networking (and for the same small ISP > in Yonkers), I'm flying off to install a new customer. I (of course) think > that I'm hot stuff because I'm going to do the install, configure the router, > whee

Re: Famous operational issues

2021-02-19 Thread Owen DeLong
In the case of Exodus when I was working there, it was literally dictated to us by the fire marshal of the city of Santa Clara (and enough other cities where we had datacenters to make a universal policy the only sensible choice). Owen > On Feb 18, 2021, at 1:07 AM, Eric Kuhnke wrote: > > On

Re: Famous operational issues

2021-02-19 Thread Wolfgang Tremmel
Do you remember the Cisco HDCI connectors? https://en.wikipedia.org/wiki/HDCI I once shipped a Cisco 4500 plus some cables to a remote data center and asked the local guys to cable them for me. With Cisco you could check the cable type and if they were properly attached. They were not. I asked

Re: Famous operational issues

2021-02-19 Thread Mark Tinka
On 2/19/21 10:40, Suresh Ramasubramanian wrote: He is. He asked a perfectly relevant question based on what he saw of the physical setup in front of him. And he kept his cool when being talked down to. I’d hire him the next minute, personally speaking. In the early 2000's, with that leve

Re: Famous operational issues

2021-02-19 Thread Suresh Ramasubramanian
Cc: nanog Subject: Re: Famous operational issues On Feb 18, 2021, at 11:51 PM, Suresh Ramasubramanian wrote: >> On 2/19/21 00:37, Warren Kumari wrote: >> and says "'K. So, you doing a full iBGP mesh, or confeds?". I really hadn't >> intended to be a

Re: Famous operational issues

2021-02-19 Thread Sabri Berisha
On Feb 18, 2021, at 11:51 PM, Suresh Ramasubramanian wrote: >> On 2/19/21 00:37, Warren Kumari wrote: >> and says "'K. So, you doing a full iBGP mesh, or confeds?". I really hadn't >> intended to be a condescending ass, but I think of that every time I realize >> I >> might be assuming someth

Re: Famous operational issues

2021-02-18 Thread Suresh Ramasubramanian
Did you at least hire the janitor? From: NANOG on behalf of Mark Tinka Date: Friday, 19 February 2021 at 10:20 AM To: nanog@nanog.org Subject: Re: Famous operational issues On 2/19/21 00:37, Warren Kumari wrote: 5: Another one. In the early 2000s I was working for a dot-com boom company. We

Re: Famous operational issues

2021-02-18 Thread George Herbert
Northridge quake. I was #2 and on call at CRL. That One Guy on dialup in Atlanta playing MUDs 23x7 pages that things are down. I wander out to my computer to dial in and see what’s up, turned on TV walking past it, sat down and turned computer on, as it was booting on comes a live helicopter

Re: Famous operational issues

2021-02-18 Thread bzs
One day I got called into the office supplies area because there was a smell of something burning. Uh-oh. To make a long story short there was a stainless steel bowl which was focusing the sun from a window such that it was igniting a cardboard box. Talk about SMH and random bad luck which coul

Re: Famous operational issues

2021-02-18 Thread Mark Tinka
On 2/19/21 00:37, Warren Kumari wrote: 5: Another one. In the early 2000s I was working for a dot-com boom company. We are building out our first datacenter, and I'm installing a pair of Cisco 7206s in 811 10th Ave. These will run basically the entire company, we have some transit, we have

Re: Famous operational issues

2021-02-18 Thread Andy Ringsmuth
> On Feb 18, 2021, at 4:37 PM, Warren Kumari wrote: > > 4: Not too long after I started doing networking (and for the same small ISP > in Yonkers), I'm flying off to install a new customer. I (of course) think > that I'm hot stuff because I'm going to do the install, configure the router, >

Re: Famous operational issues

2021-02-18 Thread Randy Bush
when employer had shipped 2xJ to london, had the circuits up, ... the local office sat on their hands. for weeks. i finally was pissed enough to throw my toolbag over my shoulder, get on a plane, and fly over. i walked into the fancy office and said "hi, i am randy, vp eng, here to help you turn

Re: Famous operational issues

2021-02-18 Thread Patrick W. Gilmore
On Feb 18, 2021, at 6:10 PM, Karl Auer wrote: > > I think it was Macchiavelli who said that one should not ascribe to > malice anything adequately explained by incompetence… https://en.wikipedia.org/wiki/Hanlon%27s_razor Never attribute to malice that which is adequately explained by st

Re: Famous operational issues

2021-02-18 Thread Brian Knight via NANOG
On 2021-02-17 13:28, John Kristoff wrote: On Wed, 17 Feb 2021 14:07:54 -0500 John Curran wrote: I have no idea what outages were most memorable for others, but the Stanford transfer switch explosion in October 1996 resulted in a much of the Internet in the Bay Area simply not being reachable f

Re: Famous operational issues

2021-02-18 Thread Paul Ebersman
warren> 2: A somewhat similar thing would happen with the Ascend TNT warren> Max, which had side-to-side airflow. These were dial termination warren> boxes, and so people would install racks and racks of them. The warren> first one would draw in cool air on the left, heat it up and warren> ship it

Re: Famous operational issues

2021-02-18 Thread Karl Auer
On Thu, 2021-02-18 at 17:37 -0500, Warren Kumari wrote: > Anyway, the subcontractor who made the power supplies for the vendor > realized that they could save a few cents by not installing the > little metal clip that held the heatsink to the MOSFET I think it was Macchiavelli who said that one sh

Re: Famous operational issues

2021-02-18 Thread Warren Kumari
On Thu, Feb 18, 2021 at 8:31 AM Jared Mauch wrote: > On Thu, Feb 18, 2021 at 01:07:01AM -0800, Eric Kuhnke wrote: > > On that note, I'd be very interested in hearing stories of actual > incidents > > that are the cause of why cardboard boxes are banned in many facilities, > > due to loose particu

Re: Famous operational issues

2021-02-18 Thread Henry Yen
On Thu, Feb 18, 2021 at 01:07:01AM -0800, Eric Kuhnke wrote: > On that note, I'd be very interested in hearing stories of actual incidents > that are the cause of why cardboard boxes are banned in many facilities, the datacenter manager's daughter's cat. -- Henry Yen

Re: Famous operational issues

2021-02-18 Thread Alain Hebert
A few I remember:     . Some monitoring server SCSI drive failed (we're talking State/Province level govt)...  Got a return back stating it will take 6 month delay to get a replacement...         Ended up choosing to use my own drive instead of leaving something that could be have been deadl

Re: Famous operational issues

2021-02-18 Thread George Metz
Normally I reference this as an example of terrible government bureaucracy, but in this case it's also how said bureaucracy can delay operational changes. I was a contractor for one of the many branches of the DoD in charge of the network at a moderate-sized site. I'd been there about 4 months, an

Re: Famous operational issues

2021-02-18 Thread t...@pelican.org
On Thursday, 18 February, 2021 16:23, "Seth Mattinen" said: > I had a customer that tried to stack their servers - no rails except the > bottom most one - using 2x4's between each server. Up until then I > hadn't imagined anyone would want to fill their cabinet with wood, so I > made a rule to ba

Re: Famous operational issues

2021-02-18 Thread Erik Sundberg
e hot screw didn't ground out. When I gave it that good old twist while plugging in the APC, I grounded the hot screw to the side of the electrical box. From: NANOG on behalf of Seth Mattinen Sent: Thursday, February 18, 2021 10:23 AM To: nanog@nano

Re: Famous operational issues

2021-02-18 Thread Seth Mattinen
On 2/18/21 1:07 AM, Eric Kuhnke wrote: On that note, I'd be very interested in hearing stories of actual incidents that are the cause of why cardboard boxes are banned in many facilities, due to loose particulate matter getting into the air and setting off very sensitive fire detection systems.

Re: Famous operational issues

2021-02-18 Thread Jared Mauch
On Thu, Feb 18, 2021 at 01:07:01AM -0800, Eric Kuhnke wrote: > On that note, I'd be very interested in hearing stories of actual incidents > that are the cause of why cardboard boxes are banned in many facilities, > due to loose particulate matter getting into the air and setting off very > sensiti

Re: Famous operational issues

2021-02-18 Thread Eric Kuhnke
On that note, I'd be very interested in hearing stories of actual incidents that are the cause of why cardboard boxes are banned in many facilities, due to loose particulate matter getting into the air and setting off very sensitive fire detection systems. Or maybe it's more mundane and 99% of the

Re: Famous operational issues

2021-02-17 Thread Owen DeLong
Stolen isn’t nearly as exciting as what happens when your (used) 6509 arrives and gets installed and operational before anyone realizes that the conductive packing peanuts that it was packed in have managed to work their way into various midplane connectors. Several hours later someone notices t

Re: Famous operational issues

2021-02-17 Thread Rogier van Eeten via NANOG
Ahh, war stories. I like the one where I got a wake up call that our IRC server was on fire,  together with the rest of the DC. Not that widespread, but we reached Slashdot. :) November 2002, University of Twente, The Netherlands. Some idiot wanted to be a hero. He deflated peoples tires, to

Re: Famous operational issues

2021-02-17 Thread John Curran
(resent - to list this time) On 16 Feb 2021, at 2:37 PM, John Kristoff mailto:j...@dataplane.org>> wrote: > > Friends, > > I'd like to start a thread about the most famous and widespread Internet > operational issues, outages or implementation incompatibilities you > have seen. > > Which example

Re: Famous operational issues

2021-02-17 Thread John Kristoff
On Wed, 17 Feb 2021 14:07:54 -0500 John Curran wrote: > I have no idea what outages were most memorable for others, but the > Stanford transfer switch explosion in October 1996 resulted in a much > of the Internet in the Bay Area simply not being reachable for > several days. Thanks John. Th

Re: Famous operational issues

2021-02-17 Thread Jared Mauch
ehalf of Justin > Wilson (Lists) > Sent: Thursday, February 18, 2021 00:53 > To: Miles Fidelman > Cc: nanog@nanog.org > Subject: Re: Famous operational issues > > I remember when the big carriers de-peered with Cogent in the early 2000s. > The underestimated the amount of

Re: Famous operational issues

2021-02-17 Thread David Guo via NANOG
Cogentco still did not peer with Google and HE over IPv6 I guess. From: NANOG on behalf of Justin Wilson (Lists) Sent: Thursday, February 18, 2021 00:53 To: Miles Fidelman Cc: nanog@nanog.org Subject: Re: Famous operational issues I remember when the big

Re: Famous operational issues

2021-02-17 Thread Justin Wilson (Lists)
I remember when the big carriers de-peered with Cogent in the early 2000s. The underestimated the amount of web-sites being hosted by people using cogent exclusively. Justin Wilson j...@j2sw.com — https://j2sw.com - All things jsw (AS209109) https://blog.j2sw.com - Podcast and Blog > On Feb

Re: Famous operational issues

2021-02-17 Thread Miles Fidelman
John Kristoff wrote: Friends, I'd like to start a thread about the most famous and widespread Internet operational issues, outages or implementation incompatibilities you have seen. Well... pre-Internet, but the great Northeast fiber cut comes to mind (backhoe vs. fiber, backhoe won). Miles

Re: Famous operational issues

2021-02-16 Thread bzs
> On Tue, 16 Feb 2021, John Kristoff wrote: > > > Friends, > > > > I'd like to start a thread about the most famous and widespread Internet > > operational issues, outages or implementation incompatibilities you > > have seen. > > When Boston University joined the internet proper ca 198

Re: Famous operational issues

2021-02-16 Thread Rich Kulawiec
On Tue, Feb 16, 2021 at 01:37:35PM -0600, John Kristoff wrote: > Which examples would make up your top three? Morris worm, November 1988. Much confusion and eventually the realization the John Brunner had called it from 13 years out ("The Shockwave Rider", 1975). But sloppy coding meant it could

Re: Famous operational issues

2021-02-16 Thread Richard Golodner
That was the one with the most severe imact for my company. Seven Frame Circuits (UUNET) and we all saw what an updtae can do On 2/16/21 3:28 PM, Sean Donelan wrote: Since you said operational issues, instead of just outage... How about MCI Worldcom's 10-day operational disaster in 1999. htt

Re: Famous operational issues

2021-02-16 Thread Mark Andrews
> On 17 Feb 2021, at 09:51, Sean Donelan wrote: > > > Biggest internet operational SUCCESS > > 1. Secure Shell (SSH) replaced TELNET. Nearly eliminated an entire class of > security problems on the Internet. But then HTTP took over everything, so a > good news/bad news. > > 2. Internet w

Re: Famous operational issues

2021-02-16 Thread Joe
If were just talking about outages historically, I recall the 1996 AOL Email debacle, not really anything to do with network mishaps but more so DNS configuration.. As well, I believe the North East 2003 blackout was a great DR test that no one was expecting. Of course we also have the big non-ev

Re: Famous operational issues

2021-02-16 Thread Simon Lockhart
On Tue Feb 16, 2021 at 09:33:20PM +0100, J?rg Kost wrote: > I don't want to classify and rate it, but would name 9/11. > > You can read about the impacts on the list archives and there is also a > presentation from NANOG '23 online. For an operational perspective, I was part of the team trying to

Re: Famous operational issues

2021-02-16 Thread Paul Ebersman
jlewis> This reminds me of one of the Sprint CO's we were colo'd in. Ah, Sprint. Nothing like using your railroad to run phone lines... Our routers in San Jose colo were black from the soot of the trains. Fondly remember a major Sprint outage in the early 90s. All our data circuits in the southea

Re: Famous operational issues

2021-02-16 Thread scott
On 2/16/2021 9:37 AM, John Kristoff wrote: I'd suggest the AS 7007 event is perhaps the most notorious and likely to top many lists including mine. AS7007 is how I found NANOG.  We (Digital Island; first job out of college) were in 10

Re: Famous operational issues

2021-02-16 Thread Pierre Emeriaud
Le mar. 16 févr. 2021 à 21:03, Job Snijders via NANOG a écrit : > > https://labs.ripe.net/Members/erik/ripe-ncc-and-duke-university-bgp-experiment/ > > The experiment triggered a bug in some Cisco router models: affected > Ciscos would corrupt this specific BGP announcement ** ON OUTBOUND **. > An

Re: [EXTERNAL] Re: Famous operational issues

2021-02-16 Thread Compton, Rich A
There was the outage in 2014 when we got to 512K routes. http://www.bgpmon.net/what-caused-todays-internet-hiccup/ On 2/16/21, 1:04 PM, "NANOG on behalf of Job Snijders via NANOG" wrote: CAUTION: The e-mail below is from an external source. Please exercise caution before opening attac

Re: Famous operational issues

2021-02-16 Thread Sean Donelan
Biggest internet operational SUCCESS 1. Secure Shell (SSH) replaced TELNET. Nearly eliminated an entire class of security problems on the Internet. But then HTTP took over everything, so a good news/bad news. 2. Internet worms massively reduced by changed default configurations and defaul

Re: Famous operational issues

2021-02-16 Thread Jon Lewis
On Tue, 16 Feb 2021, Sabri Berisha wrote: - On Feb 16, 2021, at 2:08 PM, Jared Mauch ja...@puck.nether.net wrote: Hi, I was thinking about how we need a war stories nanog track. My favorite was being on call when the router was stolen. Wait... what? I would love to listen to that call b

Re: Famous operational issues

2021-02-16 Thread Sabri Berisha
- On Feb 16, 2021, at 2:08 PM, Jared Mauch ja...@puck.nether.net wrote: Hi, > I was thinking about how we need a war stories nanog track. My favorite was > being on call when the router was stolen. Wait... what? I would love to listen to that call between you and your manager. But, here is

Re: Famous operational issues

2021-02-16 Thread Jared Mauch
I was thinking about how we need a war stories nanog track. My favorite was being on call when the router was stolen. Sent from my TI-99/4a > On Feb 16, 2021, at 2:40 PM, John Kristoff wrote: > > Friends, > > I'd like to start a thread about the most famous and widespread Internet > operati

Re: Famous operational issues

2021-02-16 Thread Justin Streiner
Would this also extend to intentional actions that may have had unintended consequences, such as provider A intentionally de-peering provider B, or the monopoly telco for $country cutting itself off from the rest of the global Internet for various reasons (technical, political, or otherwise)? That

Re: Famous operational issues

2021-02-16 Thread Jörg Kost
Oh well, MCI in 1999 was all about… https://www.youtube.com/watch?v=7iM5nFNUG4U On 16 Feb 2021, at 22:28, Sean Donelan wrote: Since you said operational issues, instead of just outage... How about MCI Worldcom's 10-day operational disaster in 1999. http://www.cnn.com/TECH/computing/9908/23/n

Re: Famous operational issues

2021-02-16 Thread Todd Underwood
There are all the hilarious leaks and blocks. Pakistan blocks youtube and the announcement leaks internet-wide. Turk telecom (AS9121 IIRC) leaks a full table out one of their providers. So many routing level incidents they're probably not even interesting any more, I suppose. The huge power out

Re: Famous operational issues

2021-02-16 Thread Sean Donelan
Since you said operational issues, instead of just outage... How about MCI Worldcom's 10-day operational disaster in 1999. http://www.cnn.com/TECH/computing/9908/23/network.nono.idg/ How not to handle a network outage [...] MCI WorldCom issued an alert to its sales force, which was given the

Re: Famous operational issues

2021-02-16 Thread Damian Menscher via NANOG
https://en.wikipedia.org/wiki/SQL_Slammer was interesting in that it was an application-layer issue that affected the network layer. Damian On Tue, Feb 16, 2021 at 11:37 AM John Kristoff wrote: > Friends, > > I'd like to start a thread about the most famous and widespread Internet > operational

Re: Famous operational issues

2021-02-16 Thread Randy Bush
> actually, the 129/8 incident a friend pointed out that it was the 128/9 incident > but folk tend not to remember it qed, eh? :)

  1   2   >