Re: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams

2003-12-10 Thread Matt Kettler
At 03:48 PM 12/10/2003, SpamTalk wrote: FOLDED set all lowercase Remove HTML punctuation to be underscore, Why on earth do you want to set all lowercase? Every regex in the ruleset can be set to case sensitve or insensitve on it's own, so this adjustment only

RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams

2003-12-10 Thread Gary Funck
-Original Message- From: SpamTalk Sent: Wednesday, December 10, 2003 12:49 PM It would seem to me that, for purposes of rule simplification, that the subject and body of messages to be scanned should be available in pre-processed flavors, some of which is currently available.

RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams

2003-12-10 Thread Gary Funck
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Gary Funck Sent: Wednesday, December 10, 2003 1:09 PM To: [EMAIL PROTECTED] Subject: RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams -Original Message

RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams

2003-12-10 Thread David B Funk
On Wed, 10 Dec 2003, Gary Funck wrote: It might be convenient to view each these transformations as operating on the output of the previous. I think you were. By doing so, it avoids replicating the description of the previous phase. I meant to add the following sugested additional

RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams

2003-12-10 Thread Gary Funck
Soundex might be a practical solution. Perhaps a manageable approach is to first apply a spelling check using both a regular dictionary and augmenting it with a set of spammer mis-spellings. Then, send the output of that step into Soundex. The Soundex is a heuristic for catching the creative

RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject rendering streams

2003-12-10 Thread Bob Apthorpe
On Wed, 10 Dec 2003, Gary Funck wrote: Soundex might be a practical solution. Perhaps a manageable approach is to first apply a spelling check using both a regular dictionary and augmenting it with a set of spammer mis-spellings. Then, send the output of that step into Soundex. The Soundex is