[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 Mark A. Hershberger changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #7 from Mark A. Hershberger 2011-05-03 20:25:40 UTC --- r87352 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 p858snake changed: What|Removed |Added Keywords||need-review, patch CC||p858sn...@gmail.com -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 Mark A. Hershberger changed: What|Removed |Added AssignedTo|wikibugs-l@lists.wikimedia. |m...@everybody.org |org | -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 --- Comment #6 from Mark A. Hershberger 2011-04-27 05:02:29 UTC --- Created attachment 8465 --> https://bugzilla.wikimedia.org/attachment.cgi?id=8465 Suggested patch Could you verify that the attached patch is where you think the /u should go to fix this? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 --- Comment #5 from Brion Vibber 2011-02-13 22:39:48 UTC --- I haven't tried profiling, but tossing a /u on in SpamRegexBatch::buildRegexes() doesn't seem to break at least. It should however be double-checked with the full-size blacklists. However -- this isn't necessarily sufficient for handling IDN domain spam, as it won't match the punycode form of the name if it's linked that way. May require some normalization to really do this right. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 Mark A. Hershberger changed: What|Removed |Added Priority|Normal |High CC||m...@everybody.org Severity|normal |major -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 --- Comment #4 from Alex Lazovsky 2011-01-04 23:25:58 UTC --- This work around works fine, thanks! Alex -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 --- Comment #3 from Bawolff 2010-12-27 03:36:18 UTC --- Sorry, the work around should not have the \b in it (presumably because things like \xD0 aren't word characters in non-utf8). \bмакросъемка\.рф becomes \xD0\xBC\xD0\xB0\xD0\xBA\xD1\x80\xD0\xBE\xD1\x81\xD1\x8A\xD0\xB5\xD0\xBC\xD0\xBA\xD0\xB0\.\xD1\x80\xD1\x84 \bпример\.испытание becomes \xD0\xBF\xD1\x80\xD0\xB8\xD0\xBC\xD0\xB5\xD1\x80\.\xD0\xB8\xD1\x81\xD0\xBF\xD1\x8B\xD1\x82\xD0\xB0\xD0\xBD\xD0\xB8\xD0\xB5 - Would someone who knows about such things be able to comment if adding the /u flag to the generated regexes would have any adverse performance affects? -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 --- Comment #2 from Alex Lazovsky 2010-12-16 11:36:21 UTC --- at first look this work around does not work, http://ru.wikipedia.org/w/index.php?diff=30229518 http://ru.wikipedia.org/w/index.php?diff=30229527 Now I use AbuseFilter http://ru.wikipedia.org/wiki/Special:AbuseFilter/117 to block such links, but this approach has some drawbacks. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 26332] Spam-blacklist does not support unicode characters in regex, needed to filter internationalized domain names
https://bugzilla.wikimedia.org/show_bug.cgi?id=26332 Bawolff changed: What|Removed |Added CC||bawolff...@gmail.com Summary|Spam-blacklist does not |Spam-blacklist does not |handle Cyrillic domains |support unicode characters ||in regex, needed to filter ||internationalized domain ||names --- Comment #1 from Bawolff 2010-12-15 19:42:23 UTC --- Presumably the SpamBlacklist extension needs to be modified to use the u flag for the regexes it makes so it interprets them as UTF-8. As a temporary work around, you can escape unicode characters using \xHH (replace HH with hex codes). For example: \bмакросъемка\.рф becomes \b\xD0\xBC\xD0\xB0\xD0\xBA\xD1\x80\xD0\xBE\xD1\x81\xD1\x8A\xD0\xB5\xD0\xBC\xD0\xBA\xD0\xB0\.\xD1\x80\xD1\x84 \bпример\.испытание becomes \b\xD0\xBF\xD1\x80\xD0\xB8\xD0\xBC\xD0\xB5\xD1\x80\.\xD0\xB8\xD1\x81\xD0\xBF\xD1\x8B\xD1\x82\xD0\xB0\xD0\xBD\xD0\xB8\xD0\xB5 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l