From: "Ron Smith" <[EMAIL PROTECTED]>
Sent: Sunday, 2008, July 27 06:16
My results for time spamc < (test email):
real 0m0.354s
user 0m0.001s
sys 0m0.005s
My results for time spamassassin < (test email):
real 0m5.310s
user 0m1.969s
sys 0m0.521s
I clearly see slowdowns in spam on the weekend so these numbers
represent likely the lower end of the scale for incoming, so a valid
estimate would certainly be that there are 150,000 to 200,000 spam
connections under a weekday load.
These are important results. They show that you have a plenty fast
enough machine for 100,000 emails per day, although given the fact
that daytime is pretty bad compared to night time you'd probably see
significant slowdowns in throughput during the day as the machine
overloads. You'd be in deadly trouble with 200,000 messages filtered.
The first result is pure throughput for spamd. Figure most of that
.354 seconds is CPU time for spamd. That gives you an upper limit on
what the machine is likely to be able to handle, 86400/.354 messages
per day. Of course you have other things running as well. So the
nearly 250,000 message capacity isn't really there. Figure it's maybe
100,000 max if the MTA and other utilities are running.
The second results show you what'd happen if you ran spamassassin
rather than spamc/spamd. (It also makes me wonder where the extra
calendar time went compared to the CPU time measures. But that's not
particularly important, I suspect.
One thing I know is that doing this sort of analysis is worthwhile and
often quite discouraging. (Pushing multiple sets of uncompressed frames
of 1080i across a PCI-e bus can do strange things to a machine's
throughput. <sigh> Heck, composing four 1080i frames to provide an
output frame can do strange things to taskmanager's eight processor
display. It's disheartening when they all show 50% to 80% processor
utilization. <double-sigh> At least I know WHY some of what I want
to do won't work. {o.o})
Earlier results you produced showed that spamd is using a fairly
nominal amount of memory for your installation, 50 megs a pop. For
children is 200 megs. The mdworker threads, at least two of which
were shown, each took 50 megs or so if my wetware memory is still
valid. So that's 300 megs. The Comunigate server seemed to be another
in that "price range". So you'd have been easily pushing half the
machine's memory with four spamd children running. Unless you had
tried to up the number of children or have enough other "little things"
to add up you'd not push a gigabyte. A half gigabyte would have been
swamped, most likely.
Now my question is this. Could the apparent memory issue simply under
the crush of the above load actually have been a combination of not
having the blocklists on, having the DNS for the mail machine set to
reference another DNS host server on my LAN, and perhaps a less than
optimal spamd process?
See above. YES - sort of.
You NOW have sufficient memory to run a solid dns cache plus, perhaps,
your own BL copies. (Somebody will correct me if I am wrong, please; but,
I believe you're at a high enough query rate that many of the BL people
will allow you to periodically download their database so that you can
run your own mirror of their servers. It won't be as immediately up to
date. But it'd reduce communications problems.)
And it comes to me that "communications problems" are a whole new measure
in this picture. Suppose at peak traffic you're running maybe 3/4 of the
day's traffic in half a day or maybe a rate of 12,500 an hour, or almost
4 a second. Let's say 10k of various transfers/message that's about half
a 768k pipe. If that pipe is shared with WWW service and other such things
make sure it is sized big enough it does not become a bottleneck.
I'm not sure how the statistics here shake out here. How many pieces
of email would a spamd child process be able to process per second
given the numbers above? If I have 3 child processes, would they be
overwhelmed by this load so that I have just misinterpreted a memory
leak issue?
I still say your readings on the Activewhazzit that Apple provides are
outrageously out of line. Something has it displaying numbers an order
of magnitude bigger than seem realistic compared to 'top' readings. I run
tons of rules and don't seem to ever get over about 70megs in use on a
spamc run. Yet it says 600 meg numbers. So I suspect it needs to get
recalibrated as to what it's numbers really mean.
I suspect it mislead you into thinking SpamAssassin used an utterly
outrageous amount of memory. (side note - as someone who used to think
64k was hog heaven 64 megabytes is utterly outrageous in its own
account. <sigh>) The real problem was probably simply in overloading the
machine with requests when you were not using the prefiltering.
Note that if a "time spamc" run starts taking materially more real time
than CPU time (sum of system and user times) it may be time to use more
children since DNS lookups have slowed for one reason or another. "Use
those cycles"; but, don't over do it. Leave time for the other stuff to
run. {^_-}
In the advanced techniques section there are people, I understand, who
choose to stress the network a bit to get more time for spamd to run.
They move spamd to another machine on the network and run it there with
100% of a machine to work with. You are at a high enough traffic level
that it would be a good idea to get someone more experienced than I am
to help you run through a system design for the data flow you have to
deal with. I touched on only a few of the things I can think of in my
relative lack of experience. (I chiefly write software. I "perforce"
administer the system on which Loren and I work because somebody has
to do it; and, I had some fun learning Linux during a time work was real
slow. So I work it in bursts - make it work smoothly then enter a period
of benign neglect punctuated by frequent moments of "yum update" and
bitching at the slow updates for critical items from the Fedora Core
people. (They are at least a rev behind on clamav, for example.)
{^_^} Joanne - a crazed technophile.