NFN Smith wrote:
> I'm working on a series of rules to find obfuscated words in subject
> lines that have been misspelled by adding an extra character (often a
> repeated letter) to a word.  For certain words, it seems to be
> appropriate to assume that if they're misspelled in that way, it's
> deliberate.
> 
> I've got the syntax for a regular expression mostly working (including
> words with trailing punctuation), but I don't have it identifying
> words where the last letter is doubled.  Thus if I have a regexp that
> looks like: 
> 
>   /\b(?!badword)(?:b.?a.?d.?w.?o.?r.?d.?)(\b|\!|\.|\,|\;|\:|\?)/i
> 
> I'm getting hits on things like 'baddword' and 'badwoord', and even
> 'badworrd!', but I'm not getting a hit on 'badwordd'
> 
> I've tried a number of variants, but still am not quite getting it.
> What am I missing?

I think the negative lookahead is biting you.  Try this:

  /\b(?!badword\b)(?:b.?a.?d.?w.?o.?r.?d.?)(\b|\!|\.|\,|\;|\:|\?)/i

-- 
Bowie

Reply via email to