On 03/05/17 03:10, Mark Clements (HappyDog) wrote:
> Can anyone confirm that MediaWiki used to behave in this manner, and
> if so why?

In MySQL 4.0, MySQL didn't really have character sets, it only had
collations. Text was stored as 8-bit clean binary, and was only
interpreted as a character sequence when compared to other text fields
for collation purposes. There was no UTF-8 collation, so we stored
UTF-8 text in text fields with the default (latin1) collation.

> If it was due to MySQL bugs, does anyone know in what version these
> were fixed?

IIRC it was fixed in MySQL 4.1 with the introduction of proper
character sets.

To migrate such a database, you need to do an ALTER TABLE to switch
the relevant fields from latin1 to the "binary" character set. If you
ALTER TABLE directly to utf8, you'll end up with "mojibake", since the
text will be incorrectly interpreted as latin1 and converted to
unicode. This is unrecoverable, you have to restore from a backup if
this happens.

I think it is possible to then do an ALTER TABLE to switch from binary
to utf8, but it's been a while since I tested that.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to