On Wed, 10 Dec 2003, Gary Funck wrote:

> > It might be convenient to view each these transformations as
> > operating on the output of the previous. I think you were.
> > By doing so, it avoids replicating the description of the
> > previous phase.
>
> I meant to add the following sugested additional
> transformation:
>
> PHONEMED in this form, the words are either converted into their
> phoneme form and/or spell-checked (perhpas augmented by a custom
> dictionary of "popular" spammer spellings). The words would be
> de-rooted as well.
>
> This paragraph suggests that the spelling transformation would
> proceed the ALPHED transformation.
>
> >
> > Note that numbers are sometimes substituted for letters. Such
> > as Gr8t and zer0, any1, me2, all41 and 14all. This argues for
> > phoneming and/or spell-checking before ALPHA-ing.

What might be easier to implement would be an enhanced version of
the "soundex" transformation (see Text::Soundex module).

The El337 version of soundex would know about the various
grapical character to sounds mappings and return results that
would be appropriate.

The only difficulty I can see would be dealing with the ambiguity
factor. (EG is '14all' -> "one-for-all" or "Laall" ).


-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{



-------------------------------------------------------
This SF.net email is sponsored by: IBM Linux Tutorials.
Become an expert in LINUX or just sharpen your skills.  Sign up for IBM's
Free Linux Tutorials.  Learn everything from the bash shell to sys admin.
Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to