[sqlalchemy] Re: convert_unicode=True results in double encoding

Michael Bayer Sat, 04 Nov 2006 08:58:16 -0800


Shannon -jj Behrens wrote:
> I'm using convert_unicode=True.  Everything is fine as long as I'm the
> one reading and writing the data.  However, if I look at what's
> actually being stored in the database, it's like the data has been
> encoded twiced.  If I switch to use_unicode=True, which I believe is
> MySQL specific, things work just fine and what's being stored in the
> database looks correct.


yes, if mysql client lib is encoding, and SA is also encoding, the data
will get encoded twice.  im not familiar with how i could look at the
encoded data to tell if it was already encoded (and not sure if i
should be...the unicode encoding option should only be enabled in one
place, not two)

>
> I started looking through the SQLAlchemy code, and I came across this:
>
>     def convert_bind_param(self, value, dialect):
>         if not dialect.convert_unicode or value is None or not
> isinstance(value, unicode):
>             return value
>         else:
>             return value.encode(dialect.encoding)
>     def convert_result_value(self, value, dialect):
>         if not dialect.convert_unicode or value is None or
> isinstance(value, unicode):
>             return value
>         else:
>             return value.decode(dialect.encoding)
>
> The logic looks backwards.  It says, "If it's not a unicode object,
> return it.  Otherwise, encode it."  Later, "If it is a unicode object,
> return it.  Otherwise decode it."

sending unicode values to databases whose client APIs dont handle
unicode involves taking a python unicode object from the application,
encoding it into an encoded series of bytes, and sending it to the
database.  receieving a result value involves taking the encoded series
of bytes and decoding into a unicode object.  so you have *non* unicode
instances going into the DB, and *non* unicode coming out - the DBAPI
is assumed to not have any idea what a python unicode object is (such
as pscopg's).

We've been doing the unicode thing for a while now, and you should
notice that we have unit tests for just about every function in SA,
especially important ones like this.  the unicode unit test runs
unicode and raw encoded values in and out in numerous ways, which pass
for at least mysql,sqlite, postgres, oracle, and ms-sql.  we have had
some people having issues with MySQL specifically, which seems to be
because some folks have a mysql config that is stuck in "convert
unicode" mode and experience the double-encoding issue.  the one
improvement that could be made here is for MySQL to provide a
subclassed unicode type that disables conversion if the dialect is
known to have convert_unicode=True already....then again i sort of like
that this forces people to understand their database config.


--~--~---------~--~----~------------~-------~--~----~
 You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: convert_unicode=True results in double encoding

Reply via email to