On Feb 6, 2014, at 6:59 PM, Erich Blume <blume.er...@gmail.com> wrote:
> Hmm, this one has me stumped. As best I can tell after poking at it using the > column_reflect event, a custom dialect, etc. - the issue here is that in > pysqlite.py we (in my Python 3.3 install) are selecting `sqlite3.dbapi2` as > the dbapi interface, but we aren't telling sqlite3 anything about how to > treat unicode errors. From what I am reading (but it seems inconsistent, > maybe?) sqlite3 automatically decodes all database retrieved values from > their bytes for text fields, returning unicode strings. Except... that > doesn't always seem to be true. I hex-edited a db file to change the utf-8 > string "hello" to "hell" + 0x92 and sqlite3 switched from returning "hello" > to b"hell\x92", or something like that - I've been poking at this for so long > I've lost track of that transcript. > > One can override sqlite3's text factory, apparently, with (for instance) > `sqlite3.text_factory = lambda x: x.decode('utf-8', errors='ignore')`. Maybe > the key is to try and find a way to trigger that from sqlalchemy? I tried and > failed, maybe someone else can point me back to the path? > > Just to re-summarize the problem: In python 3, I'm getting errors trying to > read a row from a sqlite database that has a TEXT column with an invalid > utf-8 sequence (specifically, the singleton bye '0x92'). I'd love to just > have sqlalchemy move along and ignore the byte, but I'm not clear how to do > that. Pysqlite (e.g. sqlite3 module) returns TEXT as Python unicode out of the gate. That exception message is being raised by sqlite3 itself, SQLAlchemy is just a pass-through, as the string type knows on sqlite that the value is already unicode. you might need to CAST() the value as BINARY perhaps, not sure. You’d first want to get a plain sqlite3 script to do what you want. Setting a “text_factory” at the module level of sqlite3 is certainly easy enough but that seems way too broad. Ideally you’d want to be able to get the value on a per-column basis.
signature.asc
Description: Message signed with OpenPGP using GPGMail