you should also be on MySQLdb 1.2.2. Using the Unicode type in conjunction with charset=utf8&use_unicode=0 and always passing Python unicode (u'') objects is the general recipe for unicode with MySQL. All this means is that SQLA sends utf-8-encoded strings to MySQLdb, MySQLdb does not try to encode them itself and makes MySQL aware the data should be considered as utf-8. I'm not sure what version of MySQL you're on or how older versions of that might get in the way.
On Dec 6, 2008, at 1:26 PM, n00b wrote: > > thanks for the quick reply. i kept trying with it and no have reached > the utter state of confusion. > the specification of Unicode versus String in the table def's coupled > with actual str representation > has my totally confused. here's a quick script, have a look at the > mysql table itself to see character > display: > > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > import os, sys > import unicodedata > > from sqlalchemy import * > from sqlalchemy.orm import * > > #set db > import MySQLdb > db = MySQLdb.connect(host='localhost', user='root', passwd='', > db='xxx', use_unicode=True, charset='utf8') > cur = db.cursor() > cur.execute('SET NAMES utf8') > cur.execute('SET CHARACTER SET utf8') > cur.execute('SET character_set_connection=utf8') > cur.execute('SET character_set_server=utf8') > cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') > print cur.fetchall() > > utf_repr = '\xc3\xab' > hex_repr = '\xeb' > > mysql_url = 'mysql://root:@localhost/xxx' > connect_args = {'charset':'utf8', 'use_unicode':'0'} > engine = create_engine(mysql_url, connect_args=connect_args) > metadata = MetaData() > > > test_table = Table('encoding_test', metadata, > Column(u'id', Integer, primary_key=True), > Column(u'unicode', Integer), > Column(u'u_hex', Unicode(10)), > Column(u'u_utf', Unicode(10)), > Column(u'u_str', Unicode(10)), > Column(u's_hex', String(10)), > Column(u's_utf', String(10)), > Column(u's_str', String(10)) > ) > > class EncodingTest(object): pass > > mapper(EncodingTest, test_table) > > metadata.create_all(engine) > Session = sessionmaker(bind=engine) > > session = Session() > et = EncodingTest() > et.unicode = 1 > et.u_str = u'ë' > et.u_hex = u'\xeb' > et.u_utf = u'\xc3\xab' > et.s_str = u'ë' > et.s_hex = u'\xeb' > et.s_utf = u'\xc3\xab' > session.add(et) > session.commit() > et = EncodingTest() > et.unicode = 0 > et.u_str = 'ë' > et.u_hex = '\xeb' > et.u_utf = '\xc3\xab' > et.s_str = 'ë' > et.s_hex = '\xeb' > et.s_utf = '\xc3\xab' > session.add(et) > session.commit() > session.close() > > session = Session() > results = session.query(EncodingTest).all() > for result in results: > print result.unicode > print repr(result.u_hex), repr(result.u_utf), repr(result.u_str) > print repr(result.s_hex), repr(result.s_utf), repr(result.s_str) > print > > in addition, i don't seem to be able to run the mysql settings (# set > db) from SA. > any insights are greatly appreciated. btw, the use_unciode, either in > MySQLdb or SA, > doesn't seem to have any effect on results. > > thx > > On Dec 5, 3:25 pm, Michael Bayer <[EMAIL PROTECTED]> wrote: >> I'm not sure of the mechanics of what you're experiencing, but make >> sure you use charset=utf8&use_unicode=0 with MySQL. >> >> On Dec 5, 2008, at 4:17 PM, n00b wrote: >> >> >> >>> greetings, >> >>> SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the >>> process driving me batty. all the mysql setup is fine, the chars >>> look >>> good and are umlauting to goethe's delight. moreover, insert and >>> select are working perfectly with the MySQLdb api on three different >>> *nix systems, two servers, ... it works. >> >>> where things fall apart is on the retrieval side of SA; inserts are >>> fine (using the config_args = {'charset':'utf8'} dict in the >>> create_engine call). >> >>> for example, ë, the latin small letter e with diaeresis, is stored >>> in >>> mysql hex as C3 AB; using the MySQldb client, this is exactly what i >>> get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- environment) no >>> further codecs work required. SA, on the other hand, hands me back >>> the >>> utf-hex representation, '\xeb'. >> >>> there must be some setting that i'm missing that'll give the >>> appropriate utf-8 representation at the SA (api) level. any ideas, >>> suggestions? >> >>> thx >> >>> yes, i could do '\xeb'.encode('utf8) but it's not an option. we got >>> too much data to deal with and MySQLdb is working perfectly well >>> without the extra step. thx. > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---