On Mon, 03 Nov 2014 13:13:58 +0100 Martin Pieuchot <[email protected]>
wrote:
> On 2014-11-03 12:43, Gerhard Roth wrote:
> > CVSROOT: /cvs
> > Module name: src
> > Changes by: [email protected] 2014/11/03 04:43:47
> >
> > Modified files:
> > sys/netinet : ip_carp.c
> >
> > Log message:
> > Fix kernel stack overflow by preventing carp_send_ad_all() from
> > re-entrant
> > calls. Also, when adjusting demote counts, don't call
> > carp_send_ad_all()
> > for every ifgroup with a demote count of 1 but rather call it only once
> > after adjusting the demote counts of all ifgroups.
>
> While this definitely fixes a problem it seems like a horrible
> workaround to me.
> I understand that carp(4) is a mess, but can we try to fix this in a
> proper way?
> Do you have an easy way to reproduce this recursion? Did you consider
> improving
> the error handling of carp_send_ad() & friends? Returning a proper
> error code
> and dealing with it in carp_group_demote_adj() for example, would also
> fixes the problem and makes the code easier to understand.
>
You're right that passing back error codes could also be a fix.
But I didn't want to make a larger change to ip_carp.
If you want to reproduce the recursion, try the following:
1) Define lots of interface groups and carp interfaces. Put every carp
interface in a group of its own.
2) Set pf.conf to "block all"
Then during boot, this is what will happen without the above patch:
softclock() triggers a carp_send_ad(). But carp_send_ad() gets an
EHOSTUNREACH error from ip_output() (due to pf dropping the packet).
After the error count reaches CARP_SENDAD_MAX_ERRORS(), carp_send_ad()
calls carp_group_demote_adj() which now adjusts the demote count and
calls carp_send_ad_all() -> carp_vhe_send_ad_all() -> carp_send_ad().
So we're already entered carp_send_ad() a second time and still
ip_output() will return EHOSTUNREACH. This recursion continues once
for every interface group. Define too much groups and your kernel stack
will overflow.
Gerhard