On Feb 6, 2014, at 6:59 PM, Erich Blume <blume.er...@gmail.com> wrote:

> Hmm, this one has me stumped. As best I can tell after poking at it using the 
> column_reflect event, a custom dialect, etc. - the issue here is that in 
> pysqlite.py we (in my Python 3.3 install) are selecting `sqlite3.dbapi2` as 
> the dbapi interface, but we aren't telling sqlite3 anything about how to 
> treat unicode errors. From what I am reading (but it seems inconsistent, 
> maybe?) sqlite3 automatically decodes all database retrieved values from 
> their bytes for text fields, returning unicode strings. Except... that 
> doesn't always seem to be true. I hex-edited a db file to change the utf-8 
> string "hello" to "hell" + 0x92 and sqlite3 switched from returning "hello" 
> to b"hell\x92", or something like that - I've been poking at this for so long 
> I've lost track of that transcript.
> 
> One can override sqlite3's text factory, apparently, with (for instance) 
> `sqlite3.text_factory = lambda x: x.decode('utf-8', errors='ignore')`. Maybe 
> the key is to try and find a way to trigger that from sqlalchemy? I tried and 
> failed, maybe someone else can point me back to the path?
> 
> Just to re-summarize the problem: In python 3, I'm getting errors trying to 
> read a row from a sqlite database that has a TEXT column with an invalid 
> utf-8 sequence (specifically, the singleton bye '0x92'). I'd love to just 
> have sqlalchemy move along and ignore the byte, but I'm not clear how to do 
> that.

Pysqlite (e.g. sqlite3 module) returns TEXT as Python unicode out of the gate.  
That exception message is being raised by sqlite3 itself, SQLAlchemy is just a 
pass-through, as the string type knows on sqlite that the value is already 
unicode.

you might need to CAST() the value as BINARY perhaps, not sure.   You’d first 
want to get a plain sqlite3 script to do what you want.   Setting a 
“text_factory” at the module level of sqlite3 is certainly easy enough but that 
seems way too broad.   Ideally you’d want to be able to get the value on a 
per-column basis.


Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to