At 08:34 PM 3/12/2003 -0500, you wrote:
I've been reading a little about existing spam tools, and a program
called "Send-Safe" seems to be a popular one.  It takes various measures
to get around filters and bulk mail detectors, but the authors are kind
enough to tell us how in http://www.send-safe.com/manual/ .  Sounds like
that might be a good basis for a Tokenizer - it could detect things like
long blocks of whitespace in the subject line, suspiciously encoded
URLs, etc.

Here's one I've been seeing lately... subject lines and message bodies that look like this:
"S'end you'r Ad's to 3.5 M'illio'n De'sktops E'very'day."


I've done a fair amount of natural language processing work. Let me think about some clever ways to check for this sort of thing... perhaps a dictionary lookup against punctuation-stripped words? Would a lookup via ispell be too much overhead?

----
: The tarproxy-list mailing list is archived at
:   http://www.mail-archive.com/tarproxy-list%40martiansoftware.com/
:
: To unsubscribe from this list, follow the instructions at
:   http://www.martiansoftware.com/contact.html
:
: TarProxy's project page can be found at
:   http://www.martiansoftware.com/tarproxy

Reply via email to