Ok, For the time being, I seemed to have fixed/solved the current problems.
1. On the 5.0.38 system I was making a dumb mistake (when I modded your program) - when fixed, I obtained essentially the same results as you. My conclusion from this: a. have use_unicode=0, set charset='utf8' on the connection, AND b. ensure you have table options = {'mysql_charset': 'utf8'} when creating tables, and all should be well. 2. On the 4.2.9 system, we have a different story. Installed there is a standard mysql-standard-4.1.9-pc-linux-gnu-i686, without mods. Now, *why* it has this problem, I don't know, but I found the fix for this, which was preventing anything from working: (LookupError) unknown encoding: > latin1_swedish_ci ... Apparently, there is a problem in the MySQLdb connection (either MySQLdb itself, or the mysql libs it calls), so that when you call connection.set_character_set(), and subsequently connection.character_set_name(), you never get a different name - always the original (guess what, 'latin1_swedish_ci'). Sooooo, someone posted this incredible kludge, but at first glance, it appears to work - redefine the character_set_name() method to return 'utf8' no matter what. This is ugly, to be sure, but for the time being I'll see how it goes. Thanks a *million* for your help - it's been invaluable. NOW, for transferring the data from one database to another so that the unicode works correctly. THAT sounds like fun...:) David On Nov 15, 9:06 pm, jason kirtland <[EMAIL PROTECTED]> wrote: > david wrote: > > Hi Jason - > > > Thanks a lot for the test. It is very helpful. > > > However, when I try running it on the mysql-5.0.38 machine as a > > starting point for testing, (with appropriate mods for version, etc), > > I get very mixed results. There are 6 cases in your test: > > > 1. use_unicode=1, charset='utf8', table_options={} > > I get "Warning: Incorrect string value:", but it does stick stuff in > > the database. However, they are all '?' (question marks) > > > 2. use_unicode=1, charset='utf8', table_options={'mysql_charset': > > 'utf8'} > > No warnings, but all question marks. > > > 3. use_unicode=0, charset='utf8', table_options={} > > No warnings, but all question marks. > > > 4. use_unicode=0, charset='utf8', table_options={'mysql_charset': > > 'utf8'} > > No warnings, but all question marks. > > > 5. - - table_options={} > > UnicodeDecodeError - 'latin-1' codec,...etc. > > > 6. - - table_options={'mysql_charset': > > 'utf8'} > > UnicodeDecodeError - 'latin-1' codec,...etc > > > Hmmmm. Not sure I understand this at all..... > > My output from a 5.0.41 is attached. The 4.1.9 output is the same > except for some wording changes in the encoding warnings. Both > instances have stock configurations right from the MySQL binary tarball, > so when 'table_options={}' runs here, the varchar columns are stored in > the default 'latin1'. > > When you're seeing all question marks, is that in the script output or > in the mysql client? I don't think you'd see the cyrillic characters in > the client unless you're using a capable terminal and possibly doing a > 'set charset utf8'. > > [output-5.0.41.txt]rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: > Warning: Incorrect string value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column > 'plain1' at row 1 > cursor.execute(statement, parameters) > rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string > value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'uni1' at row 1 > cursor.execute(statement, parameters) > rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string > value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'plain2' at row 1 > cursor.execute(statement, parameters) > rel_0_3_10/lib/sqlalchemy/databases/mysql.py:1038: Warning: Incorrect string > value: '\xD0\xB1\xD0\xBE\xD1\x80...' for column 'uni2' at row 1 > cursor.execute(statement, parameters) > mysql:///test?use_unicode=1&charset=utf8 table options {} > [(u'????', u'????', u'????', u'????')] > mysql:///test?use_unicode=1&charset=utf8 table options {'mysql_charset': > 'utf8'} > [(u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449', > u'\u0431\u043e\u0440\u0449', u'\u0431\u043e\u0440\u0449')] > mysql:///test?use_unicode=0&charset=utf8 table options {} > [('????', u'????', '????', u'????')] > mysql:///test?use_unicode=0&charset=utf8 table options {'mysql_charset': > 'utf8'} > [('\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', u'\u0431\u043e\u0440\u0449', > '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', u'\u0431\u043e\u0440\u0449')] > mysql:///test? table options {} > (UnicodeEncodeError) 'latin-1' codec can't encode characters in position 0-3: > ordinal not in range(256) u'INSERT INTO ut (plain1, uni1, plain2, uni2) > VALUES (%s, %s, %s, %s)' [u'\u0431\u043e\u0440\u0449', > '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', > '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89'] > mysql:///test? table options {'mysql_charset': 'utf8'} > (UnicodeEncodeError) 'latin-1' codec can't encode characters in position 0-3: > ordinal not in range(256) u'INSERT INTO ut (plain1, uni1, plain2, uni2) > VALUES (%s, %s, %s, %s)' [u'\u0431\u043e\u0440\u0449', > '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89', > '\xd0\xb1\xd0\xbe\xd1\x80\xd1\x89'] > OK --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---