After doing much research I am at a loss how really utf8_bin and utf8_general_ci differ, besides the obvious -- sort order.
What I inferred from MySQL docs and some online articles is the following. utf8_bin is pretty much what it suggests, utf8 strings are stored as bytes and MySQL server compares/sorts these on single byte basis, where the value of each byte determines sort order. Make sense to me as utf8 is multi-byte encoding anyway. So what is utf8_general_ci collation all about, since it is DEFAULT collation for utf8 character set. But here is the kicker, it does not implement all utf8 characters, especially for languages like jp, ko, arabic, and other right to left languages. I know for fact ucs2_general_uca (default collation for ucs2 Unicode encoding) has only partial Unicode support, so my guess is, the same may be true for utf8_general_ci collation. The only way I can get full Unicode support is to use utf8_bin collation as utf8_general_ci messes some characters? (chars ?? are returned from the db) My question is, has anyone else run into similar issues with collations? Especially, how can utf8_general_ci not support all characters but be the default collation for utf8? The only way to get things to work is to set charset to utf8 for everything, and utf8_bin collation for everything as well. (connection, db, tables) Cheers, -Daniel. -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]