Re: SA Problem: spam with random words to defeat Baysian filtering ...

jdow 12 Feb 2004 14:35:15 -0000

From: "Bob George" <[EMAIL PROTECTED]>

> > It takes my poor little 466 only a few seconds to scan for
> > viruses and then for SA to do its work.  I'd be swamped by
> > spam if it weren't for the extra rulesets ... as far as I can
> > tell from all the spam that's caught. My partner downloaded
> > from our server 137 spam messages yesterday, all tagged, and
> > two false negatives ... which I fed to sa-learn.
>
> That's the model we've discussed for the "low-end gateway" for users. Have
a
> "smarter" machine capable of running tools such as SA do the work, then
just
> poll for the cleaned up messages using whatever software the users want.


Bob, my trick here is a simple procmail rule to clone the messages into
a junk mailbox on the linux mailserver machine:
--8<--
:0c:
/$HOME/mail/rawmbox
--8<--

Then I use "mail" as a tool for performing the quick sort into spam and
ham. It took two days to generate my current spam database. Actual time
spent doing it was about an hour or two. Now that the database is trained
I look for any emails that slip through, find them in the raw mailbox,
and toss them into the spam training file. That takes maybe 10 minutes
every few days if I get worked up when more than a couple percent escape
the scanning process. The Baysian analysis has made me lazy about
fomenting new explicit rules here. It builds the rules for me. That's
what a computer should do for me, isn't it?

(I'm worried about when the spammers figure out how to defeat the
simple Baysian analysis. But by then they might have learned that
a trick to survive is to make the advertising interesting. TV was
a LONG time learning this. The current spammers haven't a clue on
this one, yet. But then, it'd take real creative work on their
part. I read that as well beyond them.)

{^_^}   Joanne

Re: SA Problem: spam with random words to defeat Baysian filtering ...

Reply via email to