> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of Gary
> Funck
> Sent: Wednesday, December 10, 2003 1:09 PM
> To: [EMAIL PROTECTED]
> Subject: RE: [SAtalk] [RD] raw/rare/folded/plain/alphed body/subject
> rendering streams
> 
> 
> 
> 
> > -----Original Message-----
> > From: SpamTalk
> > Sent: Wednesday, December 10, 2003 12:49 PM
> > 
> > It would seem to me that, for purposes of rule simplification, that the
> > subject and body of messages to be scanned should be available in
> > pre-processed flavors, some of which is currently available. 
> > Assume the spam
> > key is some thing like that Vuhee drug, V=P i=o e=a n=g s=r u=a (i.e.
> > Poensu)
> > 
> [abbreviated description follows]
> > RAW (untouched), RARE (de-mimed), FOLDED (all lowercase),
> > PLAIN (lc, no punctuation), ALPHED (no numbers).
> >
> 
> It might be convenient to view each these transformations as
> operating on the output of the previous. I think you were.
> By doing so, it avoids replicating the description of the
> previous phase.

I meant to add the following sugested additional
transformation:

PHONEMED in this form, the words are either converted into their
phoneme form and/or spell-checked (perhpas augmented by a custom
dictionary of "popular" spammer spellings). The words would be
de-rooted as well.

This paragraph suggests that the spelling transformation would
proceed the ALPHED transformation.

> 
> Note that numbers are sometimes substituted for letters. Such
> as Gr8t and zer0, any1, me2, all41 and 14all. This argues for
> phoneming and/or spell-checking before ALPHA-ing.
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
> Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
> Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to