On Tue, Aug 21, 2018 at 05:16:47PM +0200, Remi Locherer wrote:
> Hi tech,
> 
> recently we had a short outage in our network. A script started an additional
> ospfd instance because the -n flag for config test was missing.
> 
> What then happend was not nice:
> - The new ospfd unlinked the control socket of the first ospfd
> - The new ospfd removed all routes from the first ospfd
> - The new ospfd was not able to build up an adjacency and therefore could
>   not install the routes needed for a recovery.
> - Both ospfd instances were running but non-functional.
> 
> Of course the faulty script is fixed by now. ;-)
> 
> It would be nice if ospfd could prevent such a situation.
> 
> Below diff does these things:
> - Detect a running ospfd by first doing a connect on the control socket.
> - Do not delete the control socket on exit.
>   - This could delete the socket of another instance.
>   - Unlinking the socket on shutdown will be in the way once we add pledge
>     to the main process. It was removed recently from various daemons.
> - Do not delete routes added by another process even if they have
>   prio RTP_OSPF. Without this the new ospfd will remove all the routes
>   of the first one.
> 
> A side effect of this is that alien OSPF routes are now only logged but
> not removed anymore. Should a crashed ospfd leave some routes behind the
> next ospfd does not clean them up anymore. The admin would need to check
> the logs and remove them manually with the route command.
> 
> Does this make sense?
> 

Manually removing routes does not :)

Reply via email to