On Mon, 03 Nov 2014 13:13:58 +0100 Martin Pieuchot <mpieuc...@nolizard.org> wrote: > On 2014-11-03 12:43, Gerhard Roth wrote: > > CVSROOT: /cvs > > Module name: src > > Changes by: gerh...@cvs.openbsd.org 2014/11/03 04:43:47 > > > > Modified files: > > sys/netinet : ip_carp.c > > > > Log message: > > Fix kernel stack overflow by preventing carp_send_ad_all() from > > re-entrant > > calls. Also, when adjusting demote counts, don't call > > carp_send_ad_all() > > for every ifgroup with a demote count of 1 but rather call it only once > > after adjusting the demote counts of all ifgroups. > > While this definitely fixes a problem it seems like a horrible > workaround to me. > I understand that carp(4) is a mess, but can we try to fix this in a > proper way? > Do you have an easy way to reproduce this recursion? Did you consider > improving > the error handling of carp_send_ad() & friends? Returning a proper > error code > and dealing with it in carp_group_demote_adj() for example, would also > fixes the problem and makes the code easier to understand. >
You're right that passing back error codes could also be a fix. But I didn't want to make a larger change to ip_carp. If you want to reproduce the recursion, try the following: 1) Define lots of interface groups and carp interfaces. Put every carp interface in a group of its own. 2) Set pf.conf to "block all" Then during boot, this is what will happen without the above patch: softclock() triggers a carp_send_ad(). But carp_send_ad() gets an EHOSTUNREACH error from ip_output() (due to pf dropping the packet). After the error count reaches CARP_SENDAD_MAX_ERRORS(), carp_send_ad() calls carp_group_demote_adj() which now adjusts the demote count and calls carp_send_ad_all() -> carp_vhe_send_ad_all() -> carp_send_ad(). So we're already entered carp_send_ad() a second time and still ip_output() will return EHOSTUNREACH. This recursion continues once for every interface group. Define too much groups and your kernel stack will overflow. Gerhard