On Tue, Sep 18, 2007 at 09:52:33PM +0200, Thierry wrote:
> Basically, my load moved from 0 to 1 ( tcpdump using 80% of a cpu )  with,
> if I remember right when I looked,   80k requests being processed ( ntpd was
> using 0.2 %of the cpu at this time ).
> I did not find such overhead with iptables yet.

I ran some monitoring overnight and, probably because of the new
DNS stuff, I only got up to 93 packets per second (when averaged
over a minute).  I've got 9 hours of data so far and I'm not seeing
any correlation between userspace CPU usage (vmstat with 2 second
intervals) and the packets that tcpdump was handling.  The tcpdump
process currently has about eight seconds of CPU time associated with
it.  Tcpdump recorded 286178 NTP packets going out, which would imply
that the user side of tcpdump expends about 35 microseconds per packet.
I've got no reasonable method of determining the CPU time used by the
kernel running the pcap code.

If your tcpdump process was indeed seeing 80000 packets, then your CPU
usage is probably about right.  Multiplying up my 35 microseconds by
80000 packets shows a CPU time of about 2.8 seconds.  I would guess that
your box has a newer version of my chip, with a clock speed almost 4
times as high.  That 2.8 seconds could easily be equated to you seeing
80% CPU usage.

In real life, I don't think you'll ever see 80K NTP packets per second
(that's several megabytes of data!) from the Internet, and therefore
running tcpdump, with suitable filtering, wouldn't use any noticeable
CPU usage.

I run a mail server on the same box and I regularly get people trying to
break in through SSH which tends to cane the CPU.  I would attribute my
visible CPU usage to these processes.

> Then I have a concern with iptables regarding IO, it s generating some
> IO, and as the only other option we have with ulogd is DB ( I have few
> experiences that tell me mysql is "less usable" with > 20 Millions
> entries, but maybe pgsql would be better. The problem is that I would
> like to avoid inserting something into the actual "generate stats
> process").

I use Postgres at work, generally pretty small datasets (tables up to
100 thousand rows) but some of the stuff gets kind of big (200 million+
rows).  Postgres is better for concurrent access to the data and it's
query planner (i.e. the bit of code that rewrites your SQL query into
something reasonably efficient) was (the last time I looked) much better
than MySQL.  MySQL always tended to be rule based, which then behaved
badly when your data wasn't similar to how the rules were designed,
whereas Postgres generates statistical summaries of your data which it
then uses to pick good ways to run your query.  Neither way is perfect,
but in my experience the statistical way is *much* better.  MySQL may
have improved since I last used it though.

> I think I'm going to gather all the infos I need and look around. I
> already saw pycap ( python module ), perhaps writting something in C (
> i agree with you I think we could have something fast ).
>
>  I just would like trying to have something fast that can handle a
> possible high load.

There are lots of bindings for Postgres.  If you're really interested
in speed then don't use a database, they're good for ensuring your
data is safe and ad-hoc queries.  If this is your first time doing
something like this you'd probably be better with a database though
because most of the time you're trying to figure out what you want to
know, and writing little bits of SQL is *much* easier than writing lots
of anything else.


  Sam
_______________________________________________
timekeepers mailing list
[email protected]
https://fortytwo.ch/mailman/cgi-bin/listinfo/timekeepers

Reply via email to