Re: Bayes & Ham

spamassassin 2 Feb 2004 22:26:41 -0000

Matt Kettler wrote to Mike Samba and [EMAIL PROTECTED]:

> FWIW I use a combination of two sources for HAM training:
>
> 1) some selected chunks of my own email (ie: mailing lists not
> involving SA, personal email, etc)
>
> 2) I set up a "nonspamtrap" account, and I've subscribed this to a few
> of the newsletters my user's commonly subscribe to.


Good sources. We provide "spam" and "nonspam" accounts for our more
pro-active clients to forward spam and ham, particularly messages that
were incorrectly classified. As long as they're instructed to forward
such messages as attachments, the messages (attachments) come through
unmolested.

I'm fortunate enough to personally own a domain that is now very close
in spelling (same name, different TLD) to a domain used by a large ISP
in our region. After seeing the postmaster logs on our email server, I
set up an account to catch all of the incoming email on my domain. There
are enough mistypes that I get several hundred messages per day for
different recipients, including ham, spam, and virii. It's the closest
thing to broadly varied user email that we can get without violating our
own privacy policy.

I have a staff member (otherwise known as our Resident SpamQueen) go
through that, as well as our shared email boxes (sales, support, etc),
and train the filter. She has no problem finding 1000+ SPAM and HAM
weekly. It's done wonders for our filtering.

If we didn't have such a good source of email, I guess I'd ask a small
percentage of our customers to *voluntarily* allow us to use their
accounts to train the filter... at which point we could just have the
server FCC all of their messages to another shared mailbox on our system
for our bodacious SpamQueen to traverse. That's trivial to implement on
most systems.

Yes, filtering can be configured on a per-user basis, but we chose to
make it as simple for our clients (and as simple for us) as possible,
and go site-wide. So, the filtering may not be quite as precise, but at
least *we* control the QoS, and we err on the side of caution.

It's worked remarkably well. We've been sustaining about 95% correctly
filtered, with no false positives. Server-wide, our HAM:SPAM ratio is
about 1.5:1. With many personal accounts, though, it's more like 1:15
(90-95% SPAM), after viruses are taken out of the equation (but that's
another tangent). We'd be sunk without SpamAssassin.

- Ryan

-- 
  Ryan Thompson <[EMAIL PROTECTED]>

  SaskNow Technologies - http://www.sasknow.com
  901-1st Avenue North - Saskatoon, SK - S7K 1Y4

        Tel: 306-664-3600   Fax: 306-244-7037   Saskatoon
  Toll-Free: 877-727-5669     (877-SASKNOW)     North America

Re: Bayes & Ham

Reply via email to