Here are the headers from your message:

Server-1.1.2c-csav (Processed in 0.468303 secs); 10 Jun 2003 12:25:35 -0000
X-AmikaGuardian-Id: mail10.speakeasy.net105524793523629339
X-AmikaGuardian-Category: AN:Obvious Clues : 0.3
X-AmikaGuardian-Category: AN:Spam Headers : 0.3
X-AmikaGuardian-Category: AN:Exception : 1.7
X-AmikaGuardian-Category: AN:Urgency : 0.09
X-AmikaGuardian-Category: AN:Junk Mail : 0.42
X-AmikaGuardian-Category: AN:Spam Language : 0.12
X-AmikaGuardian-Category: AN:Spam : 0.42
X-AmikaGuardian-Category: AN:Vectored : 1.7
X-AmikaGuardian-Category: AN:Override : 1.7
X-AmikaGuardian-Category: AN:Free : 0.03
X-AmikaGuardian-Category: AN:Spam Structure : 0.3
X-AmikaGuardian-Category: AN:Forwarded Mail : 1.7
X-AmikaGuardian-Action: Do Nothing()
---
        As near as I can tell no network requests are being done.  The
precision
of the clock seconds suggests they are measuring CPU usage -- not wall
clock time since wall clock time would be meaningless on a heavily loaded
machine.  The time spent for an email is proportion ional to the size of
the email, though definitely not linearly -- a 700KB message only took
.96 seconds (compared to your 6K message).

> -----Original Message-----
> From: Simon Byrnand [mailto:[EMAIL PROTECTED]


> Well its clear that you don't run a mail server yourself
---
        The only thing that is really clear here, is that I am mightily
upset that my email is being filtered against my wishes on a $200+/month
that I pay out of my pocket.

        At SGI, they were talking about implementing a novice email filter
program, about 3 years ago -- something called SpamAssassin, at the time.
As a company we had about 7000 people left at the time (I think they are
down to 4000) now.  The network people, with *extensive* Unix background
did tests -- and with only 7000 people, and only some fraction of them
using email for anything outside of work, they figured there was no
configuration of SGI servers that could handle the load if they tried to
do it at the mail gateway, as they wanted.  The alternative was to do it
at the department gateways, but many people skipped the departmental
gateways, because some, like "engr" were constantly on overwhelm -- running
with only 16 CPU's (1 system image) and maybe 8G of memory at the time.  Our
maximum sized machines at the time, I believe, were only 128 CPU machines,
but they only went to super-computer customers -- no way IS was going to
get one of those.  I think it had about 8-12 hard disk volumes all based on
fiberchannel or SCSI technology.  The bus bandwidth was probability about 10x
what PC's were capable of.  But this is a company of only 7000 people,
not an ISP with 100,000 customers (or millions as the larger ISP's serve).
The kernel on the machine at the time, was fully preemptable, with real-time
processing available (remember SGI had to have fast real-time processing
to do live-video editing as it was used to manipulate 'live-feed' TV
images.

        I'm getting about half the email volume now as I did then.  For the
past 2
weeks:

   1120 Tue May 27
    925 Wed May 28
    787 Thu May 29
    638 Fri May 30
    450 Sat May 31
    386 Sun Jun  1
   1063 Mon Jun  2
    909 Tue Jun  3
   1028 Wed Jun  4
    702 Thu Jun  5
    721 Fri Jun  6
    394 Sat Jun  7
    328 Sun Jun  8
    734 Mon Jun  9
    349 Tue Jun 10
---
        Email doesn't arrive in a constant flow, but bunches -- depends on
what lists you subscribe to, but for me daily flow, broken down by active
hours in June:
Sun Jun  1 00 ***********
Sun Jun  1 01 ********
Sun Jun  1 02 ***************
Sun Jun  1 03 **************
Sun Jun  1 04 ***************
Sun Jun  1 05 ***********
Sun Jun  1 06 ************
Sun Jun  1 07 ******************
Sun Jun  1 08 ***************************
Sun Jun  1 09 *******************
Sun Jun  1 10 *****************************
Sun Jun  1 11 *******************
Sun Jun  1 12 ***********************
Sun Jun  1 13 ******************
Sun Jun  1 14 *************************
Sun Jun  1 15 *************
Sun Jun  1 16 ******************************
Sun Jun  1 17 *******
Sun Jun  1 18 *************
Sun Jun  1 19 **************
Sun Jun  1 20 *********
Sun Jun  1 21 *****************
Sun Jun  1 22 *********
Sun Jun  1 23 **********
Mon Jun  2 00 ****************
Mon Jun  2 01 ***************************
Mon Jun  2 02 ***************************
Mon Jun  2 03 ********************************
Mon Jun  2 04 ************
Mon Jun  2 05 ************************************
Mon Jun  2 06 **************************************
Mon Jun  2 07 ********************
Mon Jun  2 08 ***************************************
Mon Jun  2 09 ****************************************
Mon Jun  2 10 *************************************
Mon Jun  2 11 **********************************************
Mon Jun  2 12
******************************************************************************
*****
Mon Jun  2 13
******************************************************************
Mon Jun  2 14 ************************************
Mon Jun  2 15 **********************************************
Mon Jun  2 16 *************************************
Mon Jun  2 17 *****************************************************
Mon Jun  2 18
******************************************************************************
*******************************
Mon Jun  2 19
******************************************************************************
*****
Mon Jun  2 20
******************************************************************
Mon Jun  2 21 *****************************
Mon Jun  2 22 ********************************
Mon Jun  2 23 ******************************************
Tue Jun  3 00 *****************
Tue Jun  3 01 *********************************
Tue Jun  3 02 ******************************
Tue Jun  3 03 **************************
Tue Jun  3 04 ******************
Tue Jun  3 05 **********************************************
Tue Jun  3 06 *****************************************
Tue Jun  3 07 *****************************************
Tue Jun  3 08
**********************************************************************
Tue Jun  3 09
************************************************************************
Tue Jun  3 10 *****************************************************
Tue Jun  3 11 ****************************************
Tue Jun  3 12 *****************************************
Tue Jun  3 13 ***********************************************
Tue Jun  3 14 ********************************************
Tue Jun  3 15 *****************************************
Tue Jun  3 16 **********************************
Tue Jun  3 17 *******************************************
Tue Jun  3 18 ********************************
Tue Jun  3 19 *****************************
Tue Jun  3 20 ******************************
Tue Jun  3 21 **************************
Tue Jun  3 22 *********************
Tue Jun  3 23 **********************************
Wed Jun  4 00 ******************************
Wed Jun  4 01 **********************************
Wed Jun  4 02 *****************************
Wed Jun  4 03 ******************
Wed Jun  4 04 *******************************
Wed Jun  4 05 *************************************
Wed Jun  4 06 **********************************
Wed Jun  4 07 ***********************************
Wed Jun  4 08
******************************************************************************
************************************************************
Wed Jun  4 09 *******************************************
Wed Jun  4 10 ***************************************************
Wed Jun  4 11 ************************************************************
Wed Jun  4 12 *********************************************************
Wed Jun  4 13 **********************************************
Wed Jun  4 14
*****************************************************************
Wed Jun  4 15 ***********************************************************
Wed Jun  4 16 *********************************************
Wed Jun  4 17 ********************************************
Wed Jun  4 18 ************************
Wed Jun  4 19 *************************************
Wed Jun  4 20 **********************************
Wed Jun  4 21 *****************
Wed Jun  4 22 *************************
Wed Jun  4 23 ************************
Thu Jun  5 00 *********************************
Thu Jun  5 01 ***************
Thu Jun  5 02 ******************************************
Thu Jun  5 03 ***************************
Thu Jun  5 04 *******************************
Thu Jun  5 05 ***********************************************************
Thu Jun  5 06 *******************************
Thu Jun  5 07 ******************************************
Thu Jun  5 08 *********************************
Thu Jun  5 09 ***************************
Thu Jun  5 10 **************************************
Thu Jun  5 11 *****************************************
Thu Jun  5 12 ***********************************
Thu Jun  5 13 *******************************************
Thu Jun  5 14 *************************
Thu Jun  5 15 ****************************
Thu Jun  5 16 ************************
Thu Jun  5 17 *******************
Thu Jun  5 18 ******
Thu Jun  5 19 ***********************
Thu Jun  5 20 ****************
Thu Jun  5 21 ****************
Thu Jun  5 22 ***********************
Thu Jun  5 23 *************************
Fri Jun  6 00 ******************************************
Fri Jun  6 01 ****************************
Fri Jun  6 02 ***********************
Fri Jun  6 03 **********************
Fri Jun  6 04 ***************
Fri Jun  6 05 *********************
Fri Jun  6 06 **********************************
Fri Jun  6 07 *********************************************************
Fri Jun  6 08 **********************************************************
Fri Jun  6 09 *****************************************************
Fri Jun  6 10 ********************************
Fri Jun  6 11 ****************************************************
Fri Jun  6 12 **********************************************
Fri Jun  6 13 ******************************
Fri Jun  6 14 ******************************************
Fri Jun  6 15 *****************************
Fri Jun  6 16 **************************
Fri Jun  6 17 *************************************
Fri Jun  6 18 **********************
Fri Jun  6 19 *****************
Fri Jun  6 20 ******
Fri Jun  6 21 **************
Fri Jun  6 22 *******
Fri Jun  6 23 ********
Sat Jun  7 00 ********************
Sat Jun  7 01 *************
Sat Jun  7 02 **************
Sat Jun  7 03 **********
Sat Jun  7 04 ***************
Sat Jun  7 05 **************
Sat Jun  7 06 ************
Sat Jun  7 07 *************
Sat Jun  7 08 *************
Sat Jun  7 09 ********************
Sat Jun  7 10 ****************
Sat Jun  7 11 **********************
Sat Jun  7 12 *********************
Sat Jun  7 13 ************************************
Sat Jun  7 14 *******************
Sat Jun  7 15 *****************
Sat Jun  7 16 ***********************
Sat Jun  7 17 ****************
Sat Jun  7 18 *****************
Sat Jun  7 19 *****************
Sat Jun  7 20 **************
Sat Jun  7 21 **********
Sat Jun  7 22 ********
Sat Jun  7 23 **************
Sun Jun  8 00 *******
Sun Jun  8 01 ***
Sun Jun  8 02 **********
Sun Jun  8 03 *******************
Sun Jun  8 04 ***************
Sun Jun  8 05 *************
Sun Jun  8 06 ***********
Sun Jun  8 07 ************
Sun Jun  8 08 *************
Sun Jun  8 09 *****************
Sun Jun  8 10 ***********
Sun Jun  8 11 *************
Sun Jun  8 12 ***********
Sun Jun  8 13 **************************
Sun Jun  8 14 *******************
Sun Jun  8 15 ******************
Sun Jun  8 16 ***************
Sun Jun  8 17 *******************
Sun Jun  8 18 *************
Sun Jun  8 19 **************
Sun Jun  8 20 ****************
Sun Jun  8 21 ***********
Sun Jun  8 22 **************
Sun Jun  8 23 ********
Mon Jun  9 00 ******************
Mon Jun  9 01 *****************
Mon Jun  9 02 *************
Mon Jun  9 03 **********************
Mon Jun  9 04 *************************
Mon Jun  9 05 *************************
Mon Jun  9 06 **********************************
Mon Jun  9 07 ***************************
Mon Jun  9 08 ***************************************
Mon Jun  9 09 *****************************************
Mon Jun  9 10 ****************************************************
Mon Jun  9 11 ************************************************
Mon Jun  9 12 ******************************************
Mon Jun  9 13 **************************************
Mon Jun  9 14 ******************************************************
Mon Jun  9 15 ***********************************
Mon Jun  9 16 *********************************************
Mon Jun  9 17 *******************************
Mon Jun  9 18 **********************************
Mon Jun  9 19 ********************
Mon Jun  9 20 **********************
Mon Jun  9 21 ********************
Mon Jun  9 22 ******************
Mon Jun  9 23 **************
Tue Jun 10 00 ***************************
Tue Jun 10 01 ******************************
Tue Jun 10 02 *********************************
Tue Jun 10 03 *********************************************
Tue Jun 10 04 *********************
Tue Jun 10 05 ********************************
Tue Jun 10 06 **************************************
Tue Jun 10 07 ****************************************************
Tue Jun 10 08 **********************************
Tue Jun 10 09 **************************************************************
Tue Jun 10 10 ******

        You can see that email distribution is anything but constant, but
during the week the highest usage periods are in the morning about the
time people start arriving to work on the East Coast.   Note that these
notes are not geographically sorted, but US email numbers appear to
dominate the English speaking lists I'm on (no great surprise).

> First of all I dare you to find *any* ISP that is processing 100,000
> emails *simultaneously* (eg at the very same instant) on one
> server.
---
        But it could happen over a 1 hour period, no?

        Email for me alone easily can go over 100 emails in a 1 hour
period.  Clearly there are hourly distribution trends.  I strongly doubt
that I'm the only one that sees a bulk of weekday emails coming through
at business opening times (05-09) and a second, smaller wave when people
get home (17-20) on my graph (I'm in PST).

        Do you have any statistics to backup your email flow assertions?
I've been tracking email flow, for myself since the early 90's when I
wrote the above shell scripts.


> Can't
> happen. In fact its so ludicrously impossible and exagerated that your
> example is nonsensical. Even across a dozen machines 100,000 at one
> instant isn't practical simply due to process and memory limits.
---
        I think you're missing the point because you are hung up in the
details given in an extreme case to make the point.  If an ISP already
has determined the need for "X" email servers to handle incoming email
(earthlink has about 15, my ISP, speakeasy has about 5, looks like
aol has about 28), then they add filtering where each mail is inspected,
character-by-character rather than transferred around, it is likely that
(depending on cpu loading), that they'd had better double or triple
the number of servers.  Note that in the case of mail servers on the ISP's
above, I know nothing of their configuration -- they could all be 1-CPU
Pentiums to 32-64 CPU machines at each address.  I actually don't even know
how many machines are behind each IP.

        But lets assuming a vendor operating efficiently -- with as few
mail servers as possible to handle 90% peak load with <1 minute delay.

        If you add an extra .5-1.5 seconds of CPU usage to such a
configuration
and if you have only 1000 customers like me --- some hours will easily
top 100,000 -- like 1 million messages/hour.  Amika adds, lets say a
conservative average of .5 seconds to each message, that's 500,000 extra
seconds of CPU time needed during peak hours.

        The IS people at SGI couldn't do it for their 6000 customers without
noticeable and significant mail propagation delays.  You are telling me
that ISP's are immune from these effects?

>
> Secondly your assumption that because a single message takes
> 0.5 seconds
> to process means that ten simultaneous emails would take 5
> seconds and a
> hundred would take 50 seconds is just that - an assumption, and an
> incorrect to boot.
-----
        Yup -- with context switching, on a single CPU machine it will
take longer (note -- there are no network waits in the Amika system that
I see -- its all locally done processing).  I've seen times go up
exponentially as system load went up past the physical memory limit --
the system IRIX, did dig itself out of the mess, and did processes everything
in the queue, but it took hours for 300 jobs that singly would have taken
< 1-5 seconds each.  On linux, when I get close to running out of memory,
it doesn't block -- it calls in the magic process killer that uses some
exotic formula to decide which process to kill -- resulting in redundant
work that needs to be done (since in the case of emails, they wouldn't have
been delivered or processed yet).

>
> It assumes that the scanning process over the total time it
> takes is CPU
> bound and in fact on a fast machine its not, the majority of
> that time is
> taken up by network tests - the RBL checks and razor checks
> in particular.
===
        You made the wrong assumption that Amika used network tests like
SA.  There is a term for this type of blindness -- common to most people --
where they assume their context can be applied uniformly to other situations.
But if there are no network lookups RBL checks, etc, then the delays are
totally based on CPU and disk-access cycles.


> I find on our server that the scanning time when using spamc with all
> network tests turned off is under 0.1 seconds, and it
> increases to between
> 0.6 and 2 seconds with network tests enabled. This means that the CPU
> bound part of the scanning was completed in less than 0.1
> seconds and for
> the rest of the time the process just slept waiting for the
> network test
> responses to come back.
===
        I know my server is asleep most of the time.  There is no hour --
unless I'm doing development, that CPU usage even uses 10% during an hour.
But, I'm not 1-million customer ISP.

> CPU time != real time.
---
        Really?....so I've noticed:
mail:user> time ls -R /usr >/dev/null 2>&1

real    0m21.356s
user    0m1.320s
sys     0m11.060s

        CPUtime was 12.3 seconds.  Real time was 21 seconds.  So your point
is what? That for each half second of cpu used by Amika, possibly 5-10
real seconds pass because even with RAID, they are still disk bound?
In Linux, much work was spent to get networking and disk code to work
with "zero [unnecessary] memory-to-memory copies because PC's (or any
computer for that matter is still memory-bandwidth limited, with the
limiting factor in most of today's computers being how fast you can
get the data to the CPU -- not the CPU speed.  Even with Xeons at 2Mb
secondary cache, the first line cache is still under 100K and 2Mb isn't
alot of messages -- especially if you are using a non-NUMA architecture (where
all the memory caches in all of the machines have to be kept synchronized --
just so much as to not prevent overlap).


>
> This means that if the actual CPU utilization was 0.1 seconds then 10
> simultaneous scans started at the same time would take about
> 1.5 seconds -
> eg 0.1 * 10 plus the typical latency of the network tests of about 0.5
> seconds. Not 5 seconds as your theory suggests.
---
        The actual cpu utilization is the most likely figure given (.5) --
any other figure is meaningless based on outside variables like disk and
memory latency, context switching, system load.  It's not a theory -- in
small numbers, on a 4P machine, you can might get 3X single cpu performance
on Linux -- the older the kernel the worse it was when essentially, in
old kernels, only 1 process could be in the kernel at any given time (big
lock days).

> And if your server is going into swap then you're in serious
> trouble, this
> is not a normal situation, so it doesn't really count....
---
        You hope it isn't normal.  If an ISP just added filtering and
didn't add sufficient additional CPU power, it could easily happen.
>
> With most setups if the total scanning time takes longer than 5 to 10
> seconds the machine can get into serious trouble as the number of
> processes for a given incomming message rate builds up to dangerous
> levels, as some on this list (including me) have found out
> the hard way.
> (The secret to achieving maximum throughput is limiting the
> concurancy to
> a level that matches the server's CPU speed and physical memory, and
> starting a new job as each one finishes)
---
        Sendmail has done that automatically for years -- a load limit
at which to stop receiving connections and a lower load limit to just
locally queue email messages, but not try to deliver them.  I don't
think any of my machines have ever hit that.  But I'm quite aware about
optimizing for parallel execution -- it's not really a secret --- just
2nd year college computer science for those who bothered with a CS degree
(it amazes me how many programmers don't have a CS degree -- and people
really think that doesn't affect program design or reliability?)
The vast majority of .com era code was written by non-CS types who just
saw the $$$ signs and were able to use MS point and click design interfaces
that allowed "C++ for Dummies" to become a best seller.


>
> When most people on the list here are concerned about scans
> taking as long
> as 5 seconds to complete, your suggestion that people could be waiting
> hours for the messages to arrive due to spam filtering is laughable.
----
        Most people are not running ISP's with 100,000+ users.  In fact,
I'd like to ask -- how many people are managing mail servers that
serve 100,000 different clients?  I'd be surprised if there is even 1.

        I understand your points -- but you have realize -- I also understand
my points and what I am saying and have real-world stats and experience
to backup what I am saying.  I've worked for Intel, Sun and SGI (as well
as startups), and know a little bit about CPU's, disks, memory, latencies,
OS's, etc.

        But you still missed the original point.  I don't want mandatory
email filtering!  It's the camel's head in the tent.  What's next --
government
requirements for all ISP's to do the same?  There are legal implications
here as well.  ISP's enjoy common carrier status where they are not liable
for the content that travels over their wires (much like telephone companies
are not liable for illegal conversations).  The more ISP's step in and
perform content monitoring, the more they will lose that common carrier
status.

        It's like the stupid side-walk shoveling cases where it snows.  In
many places -- and the court cases have happened, if you shovel your sidewalk
and someone slips and falls, they can sue you but if you didn't shovel
your sidewalk of snow -- they can't -- because by shoveling your sidewalk,
you are "taking responsibility" for the condition of the sidewalk.  It's
a really *stupid* legal mistake for an ISP to pretend to filter viruses,
because in doing so, they are "taking responsibility" (no matter how many
disclaimers they issue to the contrary, common sense doesn't apply in
American legal matters).  So if a virus *does* get through, someone could
have a case against speakeasy for creating a belief that they were
providing safety when their safety wasn't 'good enough' and a virus gets
through (I've already seen viruses that have gotten through, but
having run MS outhouse since it came out in 96-97, I've never caught a
virus via *any* vector.  I can count the number of spam mails I *see* on
one hand and I don't have a spam filter installed, *yet*.  It was the
fact that I was seeing up to 6 a day on a bad day, that prompted me to
start investigating SA.

        I thought I'd try that before going to an explicit permit/deny
system with tokens for unknown email -- a very old, yet reliable way of
filtering out alot of spam -- even list spam.

        I don't know *ALOT* of things...in fact the more I learn, the more
I realize I don't know, but please don't assume I'm a complete idiot
(even though at times I may emulate one! :-)).  It's amazing how even
little things like setting up different email accounts for different
companies and lists really cuts down on spam -- since it's often
more traceable.

Linda




-------------------------------------------------------
This SF.net email is sponsored by:  Etnus, makers of TotalView, The best
thread debugger on the planet. Designed with thread debugging features
you've never dreamed of, try TotalView 6 free at www.etnus.com.
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to