[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-05-03 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

Mark A. Hershberger  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED

--- Comment #7 from Mark A. Hershberger  2011-05-03 
20:25:40 UTC ---
r87352

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-04-26 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

p858snake  changed:

   What|Removed |Added

   Keywords||need-review, patch
 CC||p858sn...@gmail.com

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-04-26 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

Mark A. Hershberger  changed:

   What|Removed |Added

 AssignedTo|wikibugs-l@lists.wikimedia. |m...@everybody.org
   |org |

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-04-26 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

--- Comment #6 from Mark A. Hershberger  2011-04-27 
05:02:29 UTC ---
Created attachment 8465
  --> https://bugzilla.wikimedia.org/attachment.cgi?id=8465
Suggested patch

Could you verify that the attached patch is where you think the /u should go to
fix this?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l



[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-02-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

--- Comment #5 from Brion Vibber  2011-02-13 22:39:48 UTC ---
I haven't tried profiling, but tossing a /u on in
SpamRegexBatch::buildRegexes() doesn't seem to break at least. It should
however be double-checked with the full-size blacklists.

However -- this isn't necessarily sufficient for handling IDN domain spam, as
it won't match the punycode form of the name if it's linked that way. May
require some normalization to really do this right.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-01-31 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

Mark A. Hershberger  changed:

   What|Removed |Added

   Priority|Normal  |High
 CC||m...@everybody.org
   Severity|normal  |major

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2011-01-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

--- Comment #4 from Alex Lazovsky  2011-01-04 23:25:58 
UTC ---
This work around works fine, thanks!

Alex

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2010-12-26 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

--- Comment #3 from Bawolff  2010-12-27 03:36:18 UTC ---
Sorry, the work around should not have the \b in it (presumably because things
like \xD0 aren't word characters in non-utf8).

\bмакросъемка\.рф  becomes
\xD0\xBC\xD0\xB0\xD0\xBA\xD1\x80\xD0\xBE\xD1\x81\xD1\x8A\xD0\xB5\xD0\xBC\xD0\xBA\xD0\xB0\.\xD1\x80\xD1\x84

\bпример\.испытание becomes
\xD0\xBF\xD1\x80\xD0\xB8\xD0\xBC\xD0\xB5\xD1\x80\.\xD0\xB8\xD1\x81\xD0\xBF\xD1\x8B\xD1\x82\xD0\xB0\xD0\xBD\xD0\xB8\xD0\xB5

-

Would someone who knows about such things be able to comment if adding the /u
flag to the generated regexes would have any adverse performance affects?

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2010-12-16 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

--- Comment #2 from Alex Lazovsky  2010-12-16 11:36:21 
UTC ---
at first look this work around does not work,
http://ru.wikipedia.org/w/index.php?diff=30229518
http://ru.wikipedia.org/w/index.php?diff=30229527

Now I use AbuseFilter http://ru.wikipedia.org/wiki/Special:AbuseFilter/117 to
block such links, but this approach has some drawbacks.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.

___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names

2010-12-15 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332

Bawolff  changed:

   What|Removed |Added

 CC||bawolff...@gmail.com
Summary|Spam-blacklist does not |Spam-blacklist does not
   |handle Cyrillic domains |support unicode characters
   ||in regex, needed to filter
   ||internationalized domain
   ||names

--- Comment #1 from Bawolff  2010-12-15 19:42:23 UTC ---
Presumably the SpamBlacklist extension needs to be modified to use the u flag
for the regexes it makes so it interprets them as UTF-8.

As a temporary work around, you can escape unicode characters using \xHH
(replace HH with hex codes). For example:

\bмакросъемка\.рф  becomes
\b\xD0\xBC\xD0\xB0\xD0\xBA\xD1\x80\xD0\xBE\xD1\x81\xD1\x8A\xD0\xB5\xD0\xBC\xD0\xBA\xD0\xB0\.\xD1\x80\xD1\x84

\bпример\.испытание becomes
\b\xD0\xBF\xD1\x80\xD0\xB8\xD0\xBC\xD0\xB5\xD1\x80\.\xD0\xB8\xD1\x81\xD0\xBF\xD1\x8B\xD1\x82\xD0\xB0\xD0\xBD\xD0\xB8\xD0\xB5

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l