> -----Original Message-----
> From: Sean C Truman [mailto:[EMAIL PROTECTED]]
[snip]
> The CRC idea sounded good, But the possibility of it
> working is fading.
[snip]

For another simple approach, something that appears to work very well
on one of the mail servers I run is checking the subject headers
against a known list of spam subjects, and then tagging the message as
spam.

You cant just check the subject header for equality against the ones
in a known spam subject list as spammers tend to put random
words/numbers into the mail they're sending to foil this (this also
foils the CRC approach).

What you can do, with a small overhead, is do some sort of pattern
matching approach similar to the unix diff command to give you a
similarity score against each subject heading. A score over a certain
threshold gets the email tagged as spam, adding a header to the
message that the user can then use to do some filtering. The overhead
is probably enough to not be able to do the same check against the
body of the message.

I havnt used this method long enough to trust it into deleting
messages!

If you're a Python fan you can use the difflib library
(http://www.python.org/doc/current/lib/module-difflib.html) - I'm sure
theres equivs in other languages (perl espec.)

Marcus

--
Marcus Williams - http://www.onq2.com
Quintic Ltd, 39 Newnham Rd, Cambridge, CB3 9EY

Reply via email to