I'm not sure if that was sarcasm or not...if so, consider the time better spent analyzing the issue. The attached test illustrates a round trip of unicode data containing multibyte codepoints in both directions using both a raw cursor as well as a SQLAlchemy engine. Use this as a guide with regards to how to send and receive unicode data.
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---
# -*- coding: utf-8 -*- import os, sys from sqlalchemy import * from sqlalchemy.orm import * engine = create_engine('mysql://scott:[EMAIL PROTECTED]/test?use_unicode=0&charset=utf8', echo=True) m = MetaData() test_table = Table('encoding_test', m, Column('data', Unicode(255)) ) m.drop_all(engine) m.create_all(engine) import MySQLdb db = MySQLdb.connect(host='localhost', user='scott', passwd='tiger', db='test', use_unicode=True, charset='utf8') thedata = u"""Alors vous imaginez ma surprise, au lever du jour, quand une drôle de petit voix mâa réveillé. Elle disait: « Sâil vous plaît⦠dessine-moi un mouton! »""" # insert unicode data with MySQLdb cursor = db.cursor() cursor.execute("insert into encoding_test values(%s)", thedata) cursor.close() db.commit() # insert unicode data with SQLAlchemy engine.execute(test_table.insert(), data=thedata) # retrieve both rows with MySQLdb cursor = db.cursor() cursor.execute("select data from encoding_test") back_from_mysql = [x[0] for x in cursor.fetchall()] # retrieve both with SQLAlchemy back_from_sqla = [x[0] for x in engine.execute(test_table.select()).fetchall()] # put all the strings in a set - they are all identical and it therefore has length one assert len(set(back_from_mysql + back_from_sqla + [thedata])) == 1 # MySQL agrees that both rows are identical since COUNT DISTINCT returns one cursor = db.cursor() cursor.execute("select count(distinct data) from encoding_test") assert cursor.fetchone()[0] == 1 cursor.close() for x in [thedata] + back_from_mysql + back_from_sqla: print x.encode('utf-8')
On Dec 7, 2008, at 12:00 PM, n00b wrote: > > thanks!! > you just confirmed my empirical observations, which puts me very much > at ease :) > for versions, 1.2.2 mysqldb, and v 5.0.67 and 6.0.7 (alpha) mysql > (community ed.) > > thank again. > > On Dec 7, 8:52 am, Michael Bayer <[EMAIL PROTECTED]> wrote: >> you should also be on MySQLdb 1.2.2. Using the Unicode type in >> conjunction with charset=utf8&use_unicode=0 and always passing Python >> unicode (u'') objects is the general recipe for unicode with MySQL. >> All this means is that SQLA sends utf-8-encoded strings to MySQLdb, >> MySQLdb does not try to encode them itself and makes MySQL aware the >> data should be considered as utf-8. I'm not sure what version of >> MySQL you're on or how older versions of that might get in the way. >> >> On Dec 6, 2008, at 1:26 PM, n00b wrote: >> >> >> >>> thanks for the quick reply. i kept trying with it and no have >>> reached >>> the utter state of confusion. >>> the specification of Unicode versus String in the table def's >>> coupled >>> with actual str representation >>> has my totally confused. here's a quick script, have a look at the >>> mysql table itself to see character >>> display: >> >>> #!/usr/bin/env python >>> # -*- coding: utf-8 -*- >> >>> import os, sys >>> import unicodedata >> >>> from sqlalchemy import * >>> from sqlalchemy.orm import * >> >>> #set db >>> import MySQLdb >>> db = MySQLdb.connect(host='localhost', user='root', passwd='', >>> db='xxx', use_unicode=True, charset='utf8') >>> cur = db.cursor() >>> cur.execute('SET NAMES utf8') >>> cur.execute('SET CHARACTER SET utf8') >>> cur.execute('SET character_set_connection=utf8') >>> cur.execute('SET character_set_server=utf8') >>> cur.execute('''SHOW VARIABLES LIKE 'char%'; ''') >>> print cur.fetchall() >> >>> utf_repr = '\xc3\xab' >>> hex_repr = '\xeb' >> >>> mysql_url = 'mysql://root:@localhost/xxx' >>> connect_args = {'charset':'utf8', 'use_unicode':'0'} >>> engine = create_engine(mysql_url, connect_args=connect_args) >>> metadata = MetaData() >> >>> test_table = Table('encoding_test', metadata, >>> Column(u'id', Integer, primary_key=True), >>> Column(u'unicode', Integer), >>> Column(u'u_hex', Unicode(10)), >>> Column(u'u_utf', Unicode(10)), >>> Column(u'u_str', Unicode(10)), >>> Column(u's_hex', String(10)), >>> Column(u's_utf', String(10)), >>> Column(u's_str', String(10)) >>> ) >> >>> class EncodingTest(object): pass >> >>> mapper(EncodingTest, test_table) >> >>> metadata.create_all(engine) >>> Session = sessionmaker(bind=engine) >> >>> session = Session() >>> et = EncodingTest() >>> et.unicode = 1 >>> et.u_str = u'ë' >>> et.u_hex = u'\xeb' >>> et.u_utf = u'\xc3\xab' >>> et.s_str = u'ë' >>> et.s_hex = u'\xeb' >>> et.s_utf = u'\xc3\xab' >>> session.add(et) >>> session.commit() >>> et = EncodingTest() >>> et.unicode = 0 >>> et.u_str = 'ë' >>> et.u_hex = '\xeb' >>> et.u_utf = '\xc3\xab' >>> et.s_str = 'ë' >>> et.s_hex = '\xeb' >>> et.s_utf = '\xc3\xab' >>> session.add(et) >>> session.commit() >>> session.close() >> >>> session = Session() >>> results = session.query(EncodingTest).all() >>> for result in results: >>> print result.unicode >>> print repr(result.u_hex), repr(result.u_utf), repr(result.u_str) >>> print repr(result.s_hex), repr(result.s_utf), repr(result.s_str) >>> print >> >>> in addition, i don't seem to be able to run the mysql settings (# >>> set >>> db) from SA. >>> any insights are greatly appreciated. btw, the use_unciode, either >>> in >>> MySQLdb or SA, >>> doesn't seem to have any effect on results. >> >>> thx >> >>> On Dec 5, 3:25 pm, Michael Bayer <[EMAIL PROTECTED]> wrote: >>>> I'm not sure of the mechanics of what you're experiencing, but make >>>> sure you use charset=utf8&use_unicode=0 with MySQL. >> >>>> On Dec 5, 2008, at 4:17 PM, n00b wrote: >> >>>>> greetings, >> >>>>> SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the >>>>> process driving me batty. all the mysql setup is fine, the chars >>>>> look >>>>> good and are umlauting to goethe's delight. moreover, insert and >>>>> select are working perfectly with the MySQLdb api on three >>>>> different >>>>> *nix systems, two servers, ... it works. >> >>>>> where things fall apart is on the retrieval side of SA; inserts >>>>> are >>>>> fine (using the config_args = {'charset':'utf8'} dict in the >>>>> create_engine call). >> >>>>> for example, ë, the latin small letter e with diaeresis, is stored >>>>> in >>>>> mysql hex as C3 AB; using the MySQldb client, this is exactly >>>>> what i >>>>> get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*- >>>>> environment) no >>>>> further codecs work required. SA, on the other hand, hands me back >>>>> the >>>>> utf-hex representation, '\xeb'. >> >>>>> there must be some setting that i'm missing that'll give the >>>>> appropriate utf-8 representation at the SA (api) level. any ideas, >>>>> suggestions? >> >>>>> thx >> >>>>> yes, i could do '\xeb'.encode('utf8) but it's not an option. we >>>>> got >>>>> too much data to deal with and MySQLdb is working perfectly well >>>>> without the extra step. thx. > --~--~---------~--~----~------------~-------~--~----~ > You received this message because you are subscribed to the Google > Groups "sqlalchemy" group. > To post to this group, send email to sqlalchemy@googlegroups.com > To unsubscribe from this group, send email to [EMAIL PROTECTED] > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en > -~----------~----~----~----~------~----~------~--~--- >