At 03:48 PM 12/10/2003, SpamTalk wrote:
FOLDED set all lowercase
Remove HTML
punctuation to be underscore,
Why on earth do you want to set all lowercase? Every regex in the ruleset
can be set to case sensitve or insensitve on it's own, so this adjustment
only
-Original Message-
From: SpamTalk
Sent: Wednesday, December 10, 2003 12:49 PM
It would seem to me that, for purposes of rule simplification, that the
subject and body of messages to be scanned should be available in
pre-processed flavors, some of which is currently available.
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Gary
Funck
Sent: Wednesday, December 10, 2003 1:09 PM
To: [EMAIL PROTECTED]
Subject: RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject
rendering streams
-Original Message
On Wed, 10 Dec 2003, Gary Funck wrote:
It might be convenient to view each these transformations as
operating on the output of the previous. I think you were.
By doing so, it avoids replicating the description of the
previous phase.
I meant to add the following sugested additional
Soundex might be a practical solution. Perhaps a manageable approach
is to first apply a spelling check using both a regular dictionary
and augmenting it with a set of spammer mis-spellings. Then, send the
output of that step into Soundex. The Soundex is a heuristic for catching
the creative
On Wed, 10 Dec 2003, Gary Funck wrote:
Soundex might be a practical solution. Perhaps a manageable approach
is to first apply a spelling check using both a regular dictionary
and augmenting it with a set of spammer mis-spellings. Then, send the
output of that step into Soundex. The Soundex is