Bruce We're sort of in the same boat but if you persevere you can eradicate all the problems.
Now we don't run the server as utf8 which does cause a number of problems - we don't want all the databases to be utf8 just some of them, and just some tables on some of them. For a database that needs to be utf8 you need to ensure that its default character set is utf8 - do a show database create and then an alter database as necessary. Any connections to that database must specify their default character set to be utf8 when the connection is opened otherwise you get problems. In addition you need to ensure that all the character sets on the tables default to utf8 (show create table) and that all 'text' type columns (char,varchar, text, etc) have a default type of utf8. On a non-utf8 database where the table is utf8, the default character set on the table and all the 'text' type columns must be utf8. You can't use this table in a join to a non-utf8 table, and all transactions which use this table must explicitly wrap the transaction in as pair of set character set statements - unless you create the connection on a per table/per query basis We've had lots of fun over the last few months integrating web from ends with backend processing. We're currently running 4.1.4 so 4.1.9 may be better. When you're importing/dumping remember to set the default character set on the command line of mysqldump/mysqlimport to the appropriate value otherwise you'll garbage your data on import/export. Kevin Cowley R&D Tel: 0118 902 9099 (direct line) Email: [EMAIL PROTECTED] Web: http://www.alchemetrics.co.uk > -----Original Message----- > From: Bruce Dembecki [mailto:[EMAIL PROTECTED] > Sent: 04 February 2005 14:55 > To: mysql@lists.mysql.com > Subject: Character Sets, 4.0 and 4.1 > > So today for the second time in six weeks we are faced with rolling back > to > mysql 4.0 because of dramas with character sets. I don't know about anyone > else but this supposedly wonderful feature has been nothing but a > nightmare > for us. > > So our Application servers use Unicode for our non US English products, > and > they talk to MySQL through Connector J with a flag set to use Unicode in > the > JDBC config. > > First time around we just dumped the data and then imported it into the > 4.1 > instance. Everything looked good, but it wasn't. The German folks were > complaining their various umlauts and so on were missing, and there was > more. Of course we're told to just bring the data over to mysql 4.1 and > we'll have no problems, so we do that, and because we didn't specify a > character set for the import, we got latin1, and our German and Chinese > and... All broke. > > So six weeks of trial and experimentation later and we try for another > update. This time in our create database statement when we begin to import > the database, we set the default character set to utf8 for everything. Now > after the import our Germans and Chinese folks still get the results they > expect. > > A day later and we are getting complaints from Hong Kong that there are a > whole bunch of messages appearing on their discussions with no message > body. > We look at the backend and right there in the database the messages are > sitting and the body consists of exactly one space. Whatever content was > sent to us, was turned into one space. We look at it and we see that there > a > more than a few messages that got migrated from 4.0 to 4.1 and their > message > bodies are also one space. Not all messages, just some. Not all messages > from any individual user, just some... The 4.0 version of the data has > content that consists of more than a single space... Can't quite tell what > it is, but there's content there in 4.0 that disappears in 4.1. > > So I understand that having multiple character sets is a good thing, but > to > be honest, I pretty much thought we had it in 4.0.. We told the JDBC to us > Unicode and away we went... Clearly someone was using something that > wasn't > unicode (some of the comments suggest that there is some Japanese in the > missing messages, but I can't tell), and for whatever reason mysql 4.1 > decided it should be repalced with a space character. > > I'm probably missing the point of the character set support along the way > somewhere... But I need to know how to fix this (I understand that's > difficult when all I have left is one blank space and don't know how to > reproduce the problematic data). What did I miss in the simple "open your > data files with 4.1 and it's good to go" instructions... What character > set > performs the same as MySQL 4.0, where it didn't care what character set > you > gave it, it would accept it? Can we have a character set that will give us > this functionality? > > And why are we taking input data on an import and by the looks of it an > insert, and turning it into a single space, can't we do something better > with the data? > > 4.0 worked for us with products in 20+ languages. It worked with no great > effort and no problems... Now we have the new enhanced version which > provides "better" support for international character sets, and we find > ourselves with lost data from the moment we import, and user posts > disappearing as they come in. What do we do to not have this problem? > > Best Regards, Bruce > > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: > http://lists.mysql.com/[EMAIL PROTECTED] ************************************************************************************** ALCHEMETRICS LIMITED (ALCHEMETRICS) Mulberry Park, Fishponds Road, Wokingham, Berkshire, RG41 2GX Tel: +44 (0) 118 902 9000 Fax: +44 (0) 118 902 9001 This e-mail is confidential and is intended for the use of the addressee only. If you are not the intended recipient, you are hereby notified that you must not use, copy, disclose, otherwise disseminate or take any action based on this e-mail or any information herein. If you receive this transmission in error, please notify the sender immediately by reply e-mail or by using the contact details above and then delete this e-mail. Please note that e-mail may be susceptible to data corruption, interception and unauthorised amendment. Alchemetrics does not accept any liability for any such corruption, interception, amendment or the consequences thereof. ************************************************************************************** -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]