Hi Mark,

Thanks for the helpful reply.  Comments inline:

On Tue, Aug 10, 2010 at 2:54 AM, Mark Bergsma <m...@wikimedia.org> wrote:
> As already stated elsewhere, we didn't really saturate any NICs, just
> some socket buffers. Because of the large number of configured log
> pipes, the software (udp2log) could not empty the socket buffers fast
> enough.

Based on this and IRC conversations with Tim and Domas, here's my
understanding of things now (restating to make sure that I
understand):

The current system is a single-threaded application that takes packets
in synchronously, and spits them out to several places based on the
configuration file described here:
http://wikitech.wikimedia.org/view/Squid_logging

One problem that we're hitting is that the configuration of this
daemon^H^H^H^H^H^Hlistener is that when it gets too bogged down with a
complex configuration, it doesn't get around to emptying the socket
buffer.  Since it's single threaded, it's handling each of the
configured logging destinations before reading the next packet.  We're
not CPU-bound at this point.  The existing solution seems to start
flaking out at 40% CPU with a complicated configuration, and is
humming along at 20% with the current simplified config.  The problem
is that we're blocking while we fire up awk or whatever on the logging
side, and overflowing the socket buffer.

A solution that Tim and others are kicking around is reworking the
listener in one or more of the following ways:
1.  Move to some non-blocking networking library (e.g. Boost asio, libevent)
2.  Go multi-threaded

Mark, as you point out, we could go with some multicast solution if we
need to split it up among boxes.  As Domas points out, we could even
go multi-process on the same box without really maxing it out.

The solutions we're talking about seem to solve the socket buffer
problem, but it sounds like we may also need to get some clearer
requirements on any new functionality that's needed.  It sounds like
we'll be able to get some more mileage out of the existing solution
with some of the reworking described above.  It's not entirely clear
yet if this buys us enough capacity+capability for the increased
requirements.  I'll check in with Tomasz and others working on
fundraiser stuff to find out more.

Rob

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to