On Sun, Mar 03, 2013 at 07:16:10PM +0100, Frederic Dhieux <frede...@syn.fr> wrote a message of 51 lines which said:
> Et si la centralisation est une bonne chose pour garder une > cohérence et une propreté dans son archi, il me semble nécessaire > d'au moins continuer à avoir 2 ensembles totalement indépendants et > non liés en redondance pour pouvoir basculer. Cela me semble une bonne idée (au moins pour les grosses boîtes, qui peuvent se permettre cela). Je joins aussi une analyse trouvée sur la liste outages, et que je trouve excellente : Date: Mon, 4 Mar 2013 09:31:13 +0200 From: Saku Ytti <s...@ytti.fi> To: na...@nanog.org Subject: Re: Cloudflare is down On (2013-03-03 12:46 -0800), Constantine A. Murenin wrote: > Definitely smart to be delegating your DNS to the web-accelerator > company and a single point of failure, especially if you are not just > running a web-site, but have some other independent infrastructure, > too. To be fair, most of us probably have harmonized peering edge, running one vendor, with one or two software releases and as as such as susceptible to BGP update taking down whole edge. I'm not comfortable personally to point cloudflare and say this was easily avoidable and should not have happened (Not implying you are either). If fuzzing BGP was easy, vendors would provide us working software and we wouldn't lose good portion of Internet every few years due to mangled UPDATE. I know lot of vendors are fuzzing with 'codenomicon' and they appear not to have flowspec fuzzer. Lot of things had to go wrong for this to cause outage. 1. their traffic analyzer had to have bug which could claim packet size is 90k 2. their noc people had to accept it as legit data (2.5 their internal software where filter is updated, had to accept this data, unsure if it was internal system or junos directly) 3. junos cli had to accept this data 4. flowspec had to accept it and generate nlri carrying it 5. nlri -> ACL abstraction engine had to accept it and try to program to hardware Even if cloudflare had been running out-sourced anycast DNS with many vendor edge, the records had still been pointing out to a network which you couldn't reach. Probably only thing you could have done to plan against this, would have been to have solid dual-vendor strategy, to presume that sooner or later, software defect will take one vendor completely out. And maybe they did plan for it, but decided dual-vendor costs more than the rare outages. -- ++ytti --------------------------- Liste de diffusion du FRnOG http://www.frnog.org/