[sqlalchemy] Re: utf hex instead of utf-8 return

Michael Bayer Sun, 07 Dec 2008 09:50:23 -0800

I'm not sure if that was sarcasm or not...if so, consider the time  
better spent analyzing the issue.   The attached test illustrates a  
round trip of unicode data containing multibyte codepoints in both  
directions using both a raw cursor as well as a SQLAlchemy engine.    
Use this as a guide with regards to how to send and receive unicode  
data.



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

# -*- coding: utf-8 -*-

import os, sys

from sqlalchemy import *
from sqlalchemy.orm import *

engine = create_engine('mysql://scott:[EMAIL PROTECTED]/test?use_unicode=0&charset=utf8', echo=True)

m = MetaData()
test_table = Table('encoding_test', m,
   Column('data', Unicode(255))
)
m.drop_all(engine)
m.create_all(engine)

import MySQLdb
db = MySQLdb.connect(host='localhost', user='scott', passwd='tiger', db='test', use_unicode=True, charset='utf8')

thedata = u"""Alors vous imaginez ma surprise, au lever du jour, quand une drÃ´le de petit voix mâa rÃ©veillÃ©. Elle disait: Â« Sâil vous plaÃ®tâ¦ dessine-moi un mouton! Â»"""

# insert unicode data with MySQLdb
cursor = db.cursor()
cursor.execute("insert into encoding_test values(%s)", thedata)
cursor.close()
db.commit()

# insert unicode data with SQLAlchemy
engine.execute(test_table.insert(), data=thedata)

# retrieve both rows with MySQLdb
cursor = db.cursor()
cursor.execute("select data from encoding_test")
back_from_mysql = [x[0] for x in cursor.fetchall()]

# retrieve both with SQLAlchemy
back_from_sqla = [x[0] for x in engine.execute(test_table.select()).fetchall()]

# put all the strings in a set - they are all identical and it therefore has length one
assert len(set(back_from_mysql + back_from_sqla + [thedata])) == 1

# MySQL agrees that both rows are identical since COUNT DISTINCT returns one
cursor = db.cursor()
cursor.execute("select count(distinct data) from encoding_test")
assert cursor.fetchone()[0] == 1
cursor.close()

for x in [thedata] + back_from_mysql + back_from_sqla:
    print x.encode('utf-8')


On Dec 7, 2008, at 12:00 PM, n00b wrote:

>
> thanks!!
> you just confirmed my empirical observations, which puts me very much
> at ease :)
> for versions, 1.2.2 mysqldb, and v 5.0.67 and 6.0.7 (alpha) mysql
> (community ed.)
>
> thank again.
>
> On Dec 7, 8:52 am, Michael Bayer <[EMAIL PROTECTED]> wrote:
>> you should also be on MySQLdb 1.2.2.  Using the Unicode type in
>> conjunction with charset=utf8&use_unicode=0 and always passing Python
>> unicode (u'') objects is the general recipe for unicode with MySQL.
>> All this means is that SQLA sends utf-8-encoded strings to MySQLdb,
>> MySQLdb does not try to encode them itself and makes MySQL aware the
>> data should be considered as utf-8.   I'm not sure what version of
>> MySQL you're on or how older versions of that might get in the way.
>>
>> On Dec 6, 2008, at 1:26 PM, n00b wrote:
>>
>>
>>
>>> thanks for the quick reply. i kept trying with it and no have  
>>> reached
>>> the utter state of confusion.
>>> the specification of Unicode versus String in the table def's  
>>> coupled
>>> with actual str representation
>>> has my totally confused. here's a quick script, have a look at the
>>> mysql table itself to see character
>>> display:
>>
>>> #!/usr/bin/env python
>>> # -*- coding: utf-8 -*-
>>
>>> import os, sys
>>> import unicodedata
>>
>>> from sqlalchemy import *
>>> from sqlalchemy.orm import *
>>
>>> #set db
>>> import MySQLdb
>>> db = MySQLdb.connect(host='localhost', user='root', passwd='',
>>> db='xxx', use_unicode=True, charset='utf8')
>>> cur = db.cursor()
>>> cur.execute('SET NAMES utf8')
>>> cur.execute('SET CHARACTER SET utf8')
>>> cur.execute('SET character_set_connection=utf8')
>>> cur.execute('SET character_set_server=utf8')
>>> cur.execute('''SHOW VARIABLES LIKE 'char%'; ''')
>>> print cur.fetchall()
>>
>>> utf_repr = '\xc3\xab'
>>> hex_repr = '\xeb'
>>
>>> mysql_url = 'mysql://root:@localhost/xxx'
>>> connect_args = {'charset':'utf8', 'use_unicode':'0'}
>>> engine = create_engine(mysql_url, connect_args=connect_args)
>>> metadata = MetaData()
>>
>>> test_table = Table('encoding_test', metadata,
>>>    Column(u'id', Integer, primary_key=True),
>>>    Column(u'unicode', Integer),
>>>    Column(u'u_hex', Unicode(10)),
>>>    Column(u'u_utf', Unicode(10)),
>>>    Column(u'u_str', Unicode(10)),
>>>    Column(u's_hex', String(10)),
>>>    Column(u's_utf', String(10)),
>>>    Column(u's_str', String(10))
>>> )
>>
>>> class EncodingTest(object): pass
>>
>>> mapper(EncodingTest, test_table)
>>
>>> metadata.create_all(engine)
>>> Session = sessionmaker(bind=engine)
>>
>>> session = Session()
>>> et = EncodingTest()
>>> et.unicode = 1
>>> et.u_str = u'ë'
>>> et.u_hex = u'\xeb'
>>> et.u_utf = u'\xc3\xab'
>>> et.s_str = u'ë'
>>> et.s_hex = u'\xeb'
>>> et.s_utf = u'\xc3\xab'
>>> session.add(et)
>>> session.commit()
>>> et = EncodingTest()
>>> et.unicode = 0
>>> et.u_str = 'ë'
>>> et.u_hex = '\xeb'
>>> et.u_utf = '\xc3\xab'
>>> et.s_str = 'ë'
>>> et.s_hex = '\xeb'
>>> et.s_utf = '\xc3\xab'
>>> session.add(et)
>>> session.commit()
>>> session.close()
>>
>>> session = Session()
>>> results = session.query(EncodingTest).all()
>>> for result in results:
>>>    print result.unicode
>>>    print repr(result.u_hex), repr(result.u_utf), repr(result.u_str)
>>>    print repr(result.s_hex), repr(result.s_utf), repr(result.s_str)
>>>    print
>>
>>> in addition, i don't seem to be able to run the mysql settings (#  
>>> set
>>> db) from SA.
>>> any insights are greatly appreciated. btw, the use_unciode, either  
>>> in
>>> MySQLdb or SA,
>>> doesn't seem to have any effect on results.
>>
>>> thx
>>
>>> On Dec 5, 3:25 pm, Michael Bayer <[EMAIL PROTECTED]> wrote:
>>>> I'm not sure of the mechanics of what you're experiencing, but make
>>>> sure you use charset=utf8&use_unicode=0 with MySQL.
>>
>>>> On Dec 5, 2008, at 4:17 PM, n00b wrote:
>>
>>>>> greetings,
>>
>>>>> SA (0.5.0rc1) keeps returning utf hex in stead of utf-8 and in the
>>>>> process driving me batty.  all the mysql setup is fine, the chars
>>>>> look
>>>>> good and are umlauting to goethe's delight. moreover, insert and
>>>>> select are working perfectly with the MySQLdb api on three  
>>>>> different
>>>>> *nix systems, two servers, ... it works.
>>
>>>>> where things fall apart is on the retrieval side of SA; inserts  
>>>>> are
>>>>> fine (using the config_args = {'charset':'utf8'} dict in the
>>>>> create_engine call).
>>
>>>>> for example, ë, the latin small letter e with diaeresis, is stored
>>>>> in
>>>>> mysql hex as C3 AB; using the MySQldb client, this is exactly  
>>>>> what i
>>>>> get back: '\xc3\xab' (in the # -*- coding: UTF-8 -*-  
>>>>> environment) no
>>>>> further codecs work required. SA, on the other hand, hands me back
>>>>> the
>>>>> utf-hex representation, '\xeb'.
>>
>>>>> there must be some setting that i'm missing that'll give the
>>>>> appropriate utf-8 representation at the SA (api) level. any ideas,
>>>>> suggestions?
>>
>>>>> thx
>>
>>>>> yes, i could do  '\xeb'.encode('utf8) but it's not an option. we  
>>>>> got
>>>>> too much data to deal with and MySQLdb is working perfectly well
>>>>> without the extra step. thx.
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google  
> Groups "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com
> To unsubscribe from this group, send email to [EMAIL PROTECTED]
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en
> -~----------~----~----~----~------~----~------~--~---
>

[sqlalchemy] Re: utf hex instead of utf-8 return

Reply via email to