On Tue, Aug 21, 2018 at 05:16:47PM +0200, Remi Locherer wrote: > Hi tech, > > recently we had a short outage in our network. A script started an additional > ospfd instance because the -n flag for config test was missing. > > What then happend was not nice: > - The new ospfd unlinked the control socket of the first ospfd > - The new ospfd removed all routes from the first ospfd > - The new ospfd was not able to build up an adjacency and therefore could > not install the routes needed for a recovery. > - Both ospfd instances were running but non-functional. > > Of course the faulty script is fixed by now. ;-) > > It would be nice if ospfd could prevent such a situation. > > Below diff does these things: > - Detect a running ospfd by first doing a connect on the control socket. > - Do not delete the control socket on exit. > - This could delete the socket of another instance. > - Unlinking the socket on shutdown will be in the way once we add pledge > to the main process. It was removed recently from various daemons. > - Do not delete routes added by another process even if they have > prio RTP_OSPF. Without this the new ospfd will remove all the routes > of the first one. > > A side effect of this is that alien OSPF routes are now only logged but > not removed anymore. Should a crashed ospfd leave some routes behind the > next ospfd does not clean them up anymore. The admin would need to check > the logs and remove them manually with the route command. > > Does this make sense? >
Manually removing routes does not :)