Re: More bgpd problems

2012-05-30 Thread James Shupe
On 05/30/2012 04:27 AM, Matt Hamilton wrote: > James Shupe hermetek.com> writes: > >> I've been running it to peer with 3 IPv4 peers and 3 IPv6 peers (full >> views) and another partial IPv4 view with 12k routes (actually: varying >> amounts of peers over the years, but that's the current setup) s

Re: More bgpd problems

2012-05-30 Thread Patrick Lamaiziere
Le Wed, 30 May 2012 09:27:23 + (UTC), Matt Hamilton a icrit : Hello, > I'd be very interested to see your ifstated config and how you use > that to verify peers being up as we could do with some better > monitoring here. Here we use "bgpctl show summary terse" with a grep on the peer name a

Re: More bgpd problems

2012-05-30 Thread Matt Hamilton
James Shupe hermetek.com> writes: > I've been running it to peer with 3 IPv4 peers and 3 IPv6 peers (full > views) and another partial IPv4 view with 12k routes (actually: varying > amounts of peers over the years, but that's the current setup) since 4.5 > without needing any cron jobs to watch o

Re: More bgpd problems

2012-05-30 Thread Stuart Henderson
On 2012-05-29, Matt Hamilton wrote: > Otto Moerbeek drijf.net> writes: > >> >> On Tue, May 29, 2012 at 08:57:54AM +, Matt Hamilton wrote: >> >> > Hi all, >> > >> > More bgpd problems last night :( This happened last night on two of o

Re: More bgpd problems

2012-05-29 Thread Matt Hamilton
Philip Guenther gmail.com> writes: > Roger. To paraphrase: in order for such a process to be able to dump > core, do the following: > > Create /var/empty/var/crash/ and chown it to the user that the > [chroot'ed priv-sep'ed process] runs > as, then set the kern.nosuidcoredump sysctl to 2.

Re: More bgpd problems

2012-05-29 Thread Jiri B
On Tue, May 29, 2012 at 09:25:16PM +0200, Peter J. Philipp wrote: > Recompile the bgpd with debugging symbols (CFLAGS+=-g, LDFLAGS+=-g). And > install that. I have thought -current is compiled with debug, isn't it? jirib

Re: More bgpd problems

2012-05-29 Thread James Shupe
On 05/29/2012 05:41 AM, Garry Dolley wrote: > On Tue, May 29, 2012 at 08:57:54AM +, Matt Hamilton wrote: >> Hi all, >> >> More bgpd problems last night :( This happened last night on two of our >> routers. One running an old version of OpenBSD (4.3) and one running

Re: More bgpd problems

2012-05-29 Thread Philip Guenther
On Tue, May 29, 2012 at 12:30 PM, Henning Brauer wrote: > * Peter J. Philipp [2012-05-29 21:26]: >> 1. Make BGPD dump core > > it doesn't work that way due to bgpd dropping privs and chrooting. > the way involves setting kern.nosuidcoredump to 2, but since we have > all that already written down

Re: More bgpd problems

2012-05-29 Thread Henning Brauer
* Peter J. Philipp [2012-05-29 21:26]: > 1. Make BGPD dump core it doesn't work that way due to bgpd dropping privs and chrooting. the way involves setting kern.nosuidcoredump to 2, but since we have all that already written down in an email to a non-public list, it'll be easiest to make that ava

Re: More bgpd problems

2012-05-29 Thread Peter J. Philipp
On Tue, May 29, 2012 at 04:21:12PM +, Matt Hamilton wrote: > I will happily supply what I can. Just let me know how. Hello, I've never used BGPd personally but perhaps I can help you get a backtrace. There is quite possibly two ways to get a backtrace. 1. Make BGPD dump core Recompile the

Re: More bgpd problems

2012-05-29 Thread Matt Hamilton
Henning Brauer bsws.de> writes: > > OpenBSD 5.1/amd64: > > May 29 05:55:09 fw1 bgpd[21316]: Lost child: route decision engine > > terminated; signal 11 > > now that is bad. sig11 = segfault, Must Not Happen (tm). > can you get us a backtrace? stuart, can we document the steps to do so > somewher

Re: More bgpd problems

2012-05-29 Thread Matt Hamilton
Otto Moerbeek drijf.net> writes: > According to you previous message, you are getting a different > behaviour on the 5.1 box. A segfault is not the same as running out of mem. I agree. It seems strangely co-incidental though that bgpd on both version of OpenBSD died within minutes of each other

Re: More bgpd problems

2012-05-29 Thread Patrick Coleman
On 29/05/2012, at 6:08 PM, Matt Hamilton wrote: > Stuart Henderson spacehopper.org> writes: > >> cron job to restart it, with a random delay to avoid two machines >> coming back up at the same time when all the routers at a site >> fail together... > > So you just check it every minute to see if

Re: More bgpd problems

2012-05-29 Thread Otto Moerbeek
On Tue, May 29, 2012 at 10:06:37AM +, Matt Hamilton wrote: > Otto Moerbeek drijf.net> writes: > > > > > On Tue, May 29, 2012 at 08:57:54AM +, Matt Hamilton wrote: > > > > > Hi all, > > > > > > More bgpd problems last night :( Thi

Re: More bgpd problems

2012-05-29 Thread Garry Dolley
On Tue, May 29, 2012 at 08:57:54AM +, Matt Hamilton wrote: > Hi all, > > More bgpd problems last night :( This happened last night on two of our > routers. One running an old version of OpenBSD (4.3) and one running > 5.1. Is there anyone out there actually using bpgd in produ

Re: More bgpd problems

2012-05-29 Thread Henning Brauer
* Matt Hamilton [2012-05-29 12:02]: > Stuart Henderson spacehopper.org> writes: > > cron job to restart it, with a random delay to avoid two machines > > coming back up at the same time when all the routers at a site > > fail together... > So you just check it every minute to see if it is alive?

Re: More bgpd problems

2012-05-29 Thread Henning Brauer
* Matt Hamilton [2012-05-29 10:59]: > OpenBSD 4.3/amd64: > > May 29 05:53:43 firewall1 bgpd[5090]: imsg_create: buf_open: Cannot > allocate memory out of memory. others have said enuff about running 4.3. > OpenBSD 5.1/amd64: > May 29 05:55:09 fw1 bgpd[21316]: Lost child: route decision engine

Re: More bgpd problems

2012-05-29 Thread Otto Moerbeek
On Tue, May 29, 2012 at 10:00:53AM +, Matt Hamilton wrote: > Stuart Henderson spacehopper.org> writes: > > > cron job to restart it, with a random delay to avoid two machines > > coming back up at the same time when all the routers at a site > > fail together... > > So you just check it eve

Re: More bgpd problems

2012-05-29 Thread Matt Hamilton
Otto Moerbeek drijf.net> writes: > > On Tue, May 29, 2012 at 08:57:54AM +, Matt Hamilton wrote: > > > Hi all, > > > > More bgpd problems last night :( This happened last night on two of our > > routers. One running an old version of OpenBSD (4.3) and

Re: More bgpd problems

2012-05-29 Thread Matt Hamilton
Stuart Henderson spacehopper.org> writes: > cron job to restart it, with a random delay to avoid two machines > coming back up at the same time when all the routers at a site > fail together... So you just check it every minute to see if it is alive? It seems to me to be a pretty fundamental de

Re: More bgpd problems

2012-05-29 Thread Otto Moerbeek
On Tue, May 29, 2012 at 08:57:54AM +, Matt Hamilton wrote: > Hi all, > > More bgpd problems last night :( This happened last night on two of our > routers. One running an old version of OpenBSD (4.3) and one running > 5.1. Is there anyone out there actually using bpgd in produ

Re: More bgpd problems

2012-05-29 Thread Stuart Henderson
On 2012-05-29, Matt Hamilton wrote: > More bgpd problems last night :( This happened last night on two of our > routers. One running an old version of OpenBSD (4.3) and one running > 5.1. Is there anyone out there actually using bpgd in production? Yes. > How > do you deal w

More bgpd problems

2012-05-29 Thread Matt Hamilton
Hi all, More bgpd problems last night :( This happened last night on two of our routers. One running an old version of OpenBSD (4.3) and one running 5.1. Is there anyone out there actually using bpgd in production? How do you deal with it quitting everytime something unexpected happens on the