Ariel T. Glenn, 08/01/2013 09:26:
The issue is that the bad character was added in 2004, see
https://zh.wikipedia.org/w/index.php?title=Wikipedia:%E6%96%B0%E9%97%BB%
E7%A8%BF/2004%E5%B9%B42%E6%9C%88_%28%E7%AE%80%
29&action=edit&oldid=386385
I've requested removal and revdeletion:
https://zh.w
The issue is that the bad character was added in 2004, see
https://zh.wikipedia.org/w/index.php?title=Wikipedia:%E6%96%B0%E9%97%BB%
E7%A8%BF/2004%E5%B9%B42%E6%9C%88_%28%E7%AE%80%
29&action=edit&oldid=386385
before there were aggressive checks for that sort of thing. Garbage in,
garbage out... N
All,
I've been struggling to track this for a few hours. This file is a SQL dump,
the headers says itf UTF-8.
http://dumps.wikimedia.org/zhwiki/20130102/zhwiki-20130102-langlinks.sql.gz
but:
$ isutf8 zh-langlinks.sql
zh-langlinks.sql: line 204, char 2361, byte offset 520707: invalid UTF-8 cod