Bruce

We're sort of in the same boat but if you persevere you can eradicate
all the problems.

Now we don't run the server as utf8 which does cause a number of
problems - we don't want all the databases to be utf8 just some of them,
and just some tables on some of them.

For a database that needs to be utf8 you need to ensure that its default
character set is utf8 - do a show database create and then an alter
database as necessary.
Any connections to that database must specify their default character
set to be utf8 when the connection is opened otherwise you get problems.
In addition you need to ensure that all the character sets on the tables
default to utf8 (show create table) and that all 'text' type columns
(char,varchar, text, etc) have a default type of utf8.

On a non-utf8 database where the table is utf8, the default character
set on the table and all the 'text' type columns must be utf8. You can't
use this table in a join to a non-utf8 table, and all transactions which
use this table must explicitly wrap the transaction in as pair of set
character set statements - unless you create the connection on a per
table/per query basis

We've had lots of fun over the last few months integrating web from ends
with backend processing. We're currently running 4.1.4 so 4.1.9 may be
better.

When you're importing/dumping remember to set the default character set
on the command line of mysqldump/mysqlimport to the appropriate value
otherwise you'll garbage your data on import/export.

Kevin Cowley
R&D
 
Tel: 0118 902 9099 (direct line)
Email: [EMAIL PROTECTED]
Web: http://www.alchemetrics.co.uk

> -----Original Message-----
> From: Bruce Dembecki [mailto:[EMAIL PROTECTED]
> Sent: 04 February 2005 14:55
> To: mysql@lists.mysql.com
> Subject: Character Sets, 4.0 and 4.1
> 
> So today for the second time in six weeks we are faced with rolling
back
> to
> mysql 4.0 because of dramas with character sets. I don't know about
anyone
> else but this supposedly wonderful feature has been nothing but a
> nightmare
> for us.
> 
> So our Application servers use Unicode for our non US English
products,
> and
> they talk to MySQL through Connector J with a flag set to use Unicode
in
> the
> JDBC config.
> 
> First time around we just dumped the data and then imported it into
the
> 4.1
> instance. Everything looked good, but it wasn't. The German folks were
> complaining their various umlauts and so on were missing, and there
was
> more. Of course we're told to just bring the data over to mysql 4.1
and
> we'll have no problems, so we do that, and because we didn't specify a
> character set for the import, we got latin1, and our German and
Chinese
> and... All broke.
> 
> So six weeks of trial and experimentation later and we try for another
> update. This time in our create database statement when we begin to
import
> the database, we set the default character set to utf8 for everything.
Now
> after the import our Germans and Chinese folks still get the results
they
> expect.
> 
> A day later and we are getting complaints from Hong Kong that there
are a
> whole bunch of messages appearing on their discussions with no message
> body.
> We look at the backend and right there in the database the messages
are
> sitting and the body consists of exactly one space. Whatever content
was
> sent to us, was turned into one space. We look at it and we see that
there
> a
> more than a few messages that got migrated from 4.0 to 4.1 and their
> message
> bodies are also one space. Not all messages, just some. Not all
messages
> from any individual user, just some... The 4.0 version of the data has
> content that consists of more than a single space... Can't quite tell
what
> it is, but there's content there in 4.0 that disappears in 4.1.
> 
> So I understand that having multiple character sets is a good thing,
but
> to
> be honest, I pretty much thought we had it in 4.0.. We told the JDBC
to us
> Unicode and away we went... Clearly someone was using something that
> wasn't
> unicode (some of the comments suggest that there is some Japanese in
the
> missing messages, but I can't tell), and for whatever reason mysql 4.1
> decided it should be repalced with a space character.
> 
> I'm probably missing the point of the character set support along the
way
> somewhere... But I need to know how to fix this (I understand that's
> difficult when all I have left is one blank space and don't know how
to
> reproduce the problematic data). What did I miss in the simple "open
your
> data files with 4.1 and it's good to go" instructions... What
character
> set
> performs the same as MySQL 4.0, where it didn't care what character
set
> you
> gave it, it would accept it? Can we have a character set that will
give us
> this functionality?
> 
> And why are we taking input data on an import and by the looks of it
an
> insert, and turning it into a single space, can't we do something
better
> with the data?
> 
> 4.0 worked for us with products in 20+ languages. It worked with no
great
> effort and no problems... Now we have the new enhanced version which
> provides "better" support for international character sets, and we
find
> ourselves with lost data from the moment we import, and user posts
> disappearing as they come in. What do we do to not have this problem?
> 
> Best Regards, Bruce
> 
> 
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]


**************************************************************************************
ALCHEMETRICS LIMITED (ALCHEMETRICS)
Mulberry Park, Fishponds Road, Wokingham, Berkshire, RG41 2GX
Tel:  +44 (0) 118 902 9000    Fax:  +44 (0) 118 902 9001
This e-mail is confidential and is intended for the use of the addressee only.
If you are not the intended recipient, you are hereby notified that you must 
not use, copy, disclose, otherwise disseminate or take any action based on this 
e-mail or any information herein.
If you receive this transmission in error, please notify the sender
immediately by reply e-mail or by using the contact details above and then
delete this e-mail.
Please note that e-mail may be susceptible to data corruption, interception and 
unauthorised amendment.  Alchemetrics does not accept any liability for 
any such corruption, interception, amendment or the consequences thereof.
**************************************************************************************


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to