Ok,

For the time being, I seemed to have fixed/solved the current
problems.

1. On the 5.0.38 system I was making a dumb mistake (when I modded
your program) - when fixed, I obtained essentially the same results as
you.

My conclusion from this:

a. have use_unicode=0, set charset='utf8' on the connection, AND

b. ensure you have
 table options = {'mysql_charset': 'utf8'} when creating tables, and
all should be well.

2. On the 4.2.9 system, we have a different story. Installed there is
a standard
 mysql-standard-4.1.9-pc-linux-gnu-i686, without mods. Now, *why* it
has this problem, I don't know, but I found the fix for this, which
was preventing anything from working:

(LookupError) unknown encoding: > latin1_swedish_ci ...

Apparently, there is a problem in the MySQLdb connection (either
MySQLdb itself, or the mysql libs it calls), so that when you call
connection.set_character_set(), and subsequently
connection.character_set_name(), you never get a different name -
always the original (guess what, 'latin1_swedish_ci'). Sooooo, someone
posted this incredible kludge, but at first glance, it appears to work
- redefine the character_set_name() method to return 'utf8' no matter
what. This is ugly, to be sure, but for the time being I'll see how it
goes.

Thanks a *million* for your help - it's been invaluable.

NOW, for transferring the data from one database to another so that
the unicode works correctly. THAT sounds like fun...:)

David



On Nov 15, 9:06 pm, jason kirtland <[EMAIL PROTECTED]> wrote:
> david wrote:
> > Hi Jason -
>
> > Thanks a lot for the test. It is very helpful.
>
> > However, when I try running it on the mysql-5.0.38 machine as a
> > starting point for testing, (with appropriate mods for version, etc),
> > I get very mixed results. There are 6 cases in your test:
>
> > 1. use_unicode=1, charset='utf8', table_options={}
> > I get  "Warning: Incorrect string value:", but it does stick stuff in
> > the database. However, they are all '?' (question marks)
>
> > 2. use_unicode=1, charset='utf8', table_options={'mysql_charset':
> > 'utf8'}
> > No warnings, but all question marks.
>
> > 3. use_unicode=0, charset='utf8', table_options={}
> > No warnings, but all question marks.
>
> > 4. use_unicode=0, charset='utf8', table_options={'mysql_charset':
> > 'utf8'}
> > No warnings, but all question marks.
>
> > 5. -              -               table_options={}
> > UnicodeDecodeError - 'latin-1' codec,...etc.
>
> > 6. -              -               table_options={'mysql_charset':
> > 'utf8'}
> > UnicodeDecodeError - 'latin-1' codec,...etc
>
> > Hmmmm. Not sure I understand this at all.....
>
> My output from a 5.0.41 is attached.  The 4.1.9 output is the same
> except for some wording changes in the encoding warnings.  Both
> instances have stock configurations right from the MySQL binary tarball,
> so when 'table_options={}' runs here, the varchar columns are stored in
> the default 'latin1'.
>
> When you're seeing all question marks, is that in the script output or
> in the mysql client?  I don't think you'd see the cyrillic characters in
> the client unless you're using a capable terminal and possibly doing a
> 'set charset utf8'.
>
> [output-5.0.41.txt]rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: 
> Warning: Incorrect string value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 
> 'plain1' at row 1
>   cursor.execute(statement, parameters)
> rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string 
> value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'uni1' at row 1
>   cursor.execute(statement, parameters)
> rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string 
> value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'plain2' at row 1
>   cursor.execute(statement, parameters)
> rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string 
> value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'uni2' at row 1
>   cursor.execute(statement, parameters)
> mysql:///test?use_unicode=1&charset=utf8 table options {}
> [(u'????', u'????', u'????', u'????')]
> mysql:///test?use_unicode=1&charset=utf8 table options {'mysql_charset': 
> 'utf8'}
> [(u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449', 
> u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449')]
> mysql:///test?use_unicode=0&charset=utf8 table options {}
> [('????', u'????', '????', u'????')]
> mysql:///test?use_unicode=0&charset=utf8 table options {'mysql_charset': 
> 'utf8'}
> [('\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', u'\u0431\u043e\u0440\u0449', 
> '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', u'\u0431\u043e\u0440\u0449')]
> mysql:///test? table options {}
> (UnicodeEncodeError) 'latin-1' codec can't encode characters in position 0-3: 
> ordinal not in range(256) u'INSERT INTO ut (plain1, uni1, plain2, uni2) 
> VALUES (%s, %s, %s, %s)' [u'\u0431\u043e\u0440\u0449', 
> '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', 
> '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89']
> mysql:///test? table options {'mysql_charset': 'utf8'}
> (UnicodeEncodeError) 'latin-1' codec can't encode characters in position 0-3: 
> ordinal not in range(256) u'INSERT INTO ut (plain1, uni1, plain2, uni2) 
> VALUES (%s, %s, %s, %s)' [u'\u0431\u043e\u0440\u0449', 
> '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', 
> '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89']
> OK
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to