On Wed, 10 Dec 2003, Gary Funck wrote: > > It might be convenient to view each these transformations as > > operating on the output of the previous. I think you were. > > By doing so, it avoids replicating the description of the > > previous phase. > > I meant to add the following sugested additional > transformation: > > PHONEMED in this form, the words are either converted into their > phoneme form and/or spell-checked (perhpas augmented by a custom > dictionary of "popular" spammer spellings). The words would be > de-rooted as well. > > This paragraph suggests that the spelling transformation would > proceed the ALPHED transformation. > > > > > Note that numbers are sometimes substituted for letters. Such > > as Gr8t and zer0, any1, me2, all41 and 14all. This argues for > > phoneming and/or spell-checking before ALPHA-ing.
What might be easier to implement would be an enhanced version of the "soundex" transformation (see Text::Soundex module). The El337 version of soundex would know about the various grapical character to sounds mappings and return results that would be appropriate. The only difficulty I can see would be dealing with the ambiguity factor. (EG is '14all' -> "one-for-all" or "Laall" ). -- Dave Funk University of Iowa <dbfunk (at) engineering.uiowa.edu> College of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include <std_disclaimer.h> Better is not better, 'standard' is better. B{ ------------------------------------------------------- This SF.net email is sponsored by: IBM Linux Tutorials. Become an expert in LINUX or just sharpen your skills. Sign up for IBM's Free Linux Tutorials. Learn everything from the bash shell to sys admin. Click now! http://ads.osdn.com/?ad_id=1278&alloc_id=3371&op=click _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk