Shannon -jj Behrens wrote: > I'm using convert_unicode=True. Everything is fine as long as I'm the > one reading and writing the data. However, if I look at what's > actually being stored in the database, it's like the data has been > encoded twiced. If I switch to use_unicode=True, which I believe is > MySQL specific, things work just fine and what's being stored in the > database looks correct.
yes, if mysql client lib is encoding, and SA is also encoding, the data will get encoded twice. im not familiar with how i could look at the encoded data to tell if it was already encoded (and not sure if i should be...the unicode encoding option should only be enabled in one place, not two) > > I started looking through the SQLAlchemy code, and I came across this: > > def convert_bind_param(self, value, dialect): > if not dialect.convert_unicode or value is None or not > isinstance(value, unicode): > return value > else: > return value.encode(dialect.encoding) > def convert_result_value(self, value, dialect): > if not dialect.convert_unicode or value is None or > isinstance(value, unicode): > return value > else: > return value.decode(dialect.encoding) > > The logic looks backwards. It says, "If it's not a unicode object, > return it. Otherwise, encode it." Later, "If it is a unicode object, > return it. Otherwise decode it." sending unicode values to databases whose client APIs dont handle unicode involves taking a python unicode object from the application, encoding it into an encoded series of bytes, and sending it to the database. receieving a result value involves taking the encoded series of bytes and decoding into a unicode object. so you have *non* unicode instances going into the DB, and *non* unicode coming out - the DBAPI is assumed to not have any idea what a python unicode object is (such as pscopg's). We've been doing the unicode thing for a while now, and you should notice that we have unit tests for just about every function in SA, especially important ones like this. the unicode unit test runs unicode and raw encoded values in and out in numerous ways, which pass for at least mysql,sqlite, postgres, oracle, and ms-sql. we have had some people having issues with MySQL specifically, which seems to be because some folks have a mysql config that is stuck in "convert unicode" mode and experience the double-encoding issue. the one improvement that could be made here is for MySQL to provide a subclassed unicode type that disables conversion if the dialect is known to have convert_unicode=True already....then again i sort of like that this forces people to understand their database config. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---