http://bugzilla.spamassassin.org/show_bug.cgi?id=3191
------- Additional Comments From [EMAIL PROTECTED] 2004-03-18 11:40 ------- This one is easy to explain: a word boundary is any word char [\w] followed by a non-word char [\W] or the other way around... so \w\W or \W\w. An accented character is NOT part of the \w class, therefore "� " doesn't count as "\w\W". I worked around this in my obfu rule generator (http://sandgnat.com/cmos/) by using an "or grouping" when matching word boundaries. See how the rules generated by http://sandgnat.com/cmos/cmos.jsp?words=foo (which is based on the regexp /\bfoo\b/) have this pattern embedded at the very end: (?:[o0]\b|(?:[\*\xB0\xBA\xD8\xF8\xD2-\xD6\xF2-\xF6]|\(\)|\[\]|\xC5[\x8C-\x91]|\xC6[\xA0-\xA1]|\xC7[\x91-\x92]|\xC7[\xBE-\xBF]|\xCE\x8C|\xCE\x98|\xCE\x9F|\xCE\xB8|\xCE\xBF|\xCF\x8C|\xD0\x9E|\xD0\xBE|\xD5\x95)\B) That regexp snippet matches the "o\b" part of /\bfoo\b/ including all the accented versions (My script by default doesn't print the literal accented values, but instead the escaped versions, such as "\xB0", because some browsers have issues w/copy/pasting them) and also multi-byte characters (which HTML::Entities generates from &xxx; entities) ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.
