NFN Smith wrote: > I'm working on a series of rules to find obfuscated words in subject > lines that have been misspelled by adding an extra character (often a > repeated letter) to a word. For certain words, it seems to be > appropriate to assume that if they're misspelled in that way, it's > deliberate. > > I've got the syntax for a regular expression mostly working (including > words with trailing punctuation), but I don't have it identifying > words where the last letter is doubled. Thus if I have a regexp that > looks like: > > /\b(?!badword)(?:b.?a.?d.?w.?o.?r.?d.?)(\b|\!|\.|\,|\;|\:|\?)/i > > I'm getting hits on things like 'baddword' and 'badwoord', and even > 'badworrd!', but I'm not getting a hit on 'badwordd' > > I've tried a number of variants, but still am not quite getting it. > What am I missing?
I think the negative lookahead is biting you. Try this: /\b(?!badword\b)(?:b.?a.?d.?w.?o.?r.?d.?)(\b|\!|\.|\,|\;|\:|\?)/i -- Bowie