Thanks Simon, Do you know how I might use that with reflection? There's several hundred of these columns, I'd hate to have to override each one individually - that sort of defeats the purpose of reflection.
One thought I just had was perhaps I could subclass the Text type and then override the ischema_names for SQLite for TEXT type. That'd do the trick, I suspect! On Tue, Feb 4, 2014 at 3:26 AM, Simon King <si...@simonking.org.uk> wrote: > On Tue, Feb 4, 2014 at 10:15 AM, Erich Blume <blume.er...@gmail.com> > wrote: > > I am working on a binding to a SQLite database that I do not control the > > creation of, with the aid of reflection. I'm running in to what I believe > > are very basic UTF-8 decoding errors. For instance, a TEXT cell has the > byte > > '0x92' in it and is causing an OperationalError. Presumably, this is > because > > 0x92 (by itself) is not a valid encoding for any Unicode code point. I > would > > prefer that the decoding from UTF-8 to be forced, perhaps by dropping the > > bad byte. How can I do this? > > > > The database has a table with a column called 'description', which is of > > type TEXT. The "PRAGMA encoding" is left at 'UTF-8', thank goodness. One > of > > the rows, however, contains within its otherwise ascii byte contents the > > singleton byte '0x92'. Based on the context of the sentence, it seems > that > > this was intended to be encoded as a single quotation mark, some googling > > suggests 'RIGHT SINGLE QUOTATION MARK' in unicode, which is '0xE2 0x80 > > 0x99'. I gather that MSSQL (which was the original source of the data in > > this database) uses Microsofts' infernal web encodings sometimes and > that is > > probably the source of this byte. > > > > The issue is this: I really need to read this data! It would be *ideal* > to > > have the aid of something like python's 'replace' decoding handler but > > failing that just eliding the byte would do fine in a pinch. > > > > When fetching this row in Python 3.3 with SQLAlchemy 0.9.1 my session > looks > > vaguely like this (with the text and stack trace truncated out for > brevity). > > > > File > > > "/usr/local/Cellar/python3/3.3.3/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages/sqlalchemy/engine/result.py", > > line 760, in <listcomp> > > return [process_row(metadata, row, processors, keymap) > > sqlalchemy.exc.OperationalError: (OperationalError) Could not > decode > > to UTF-8 column 'description' with text <...> > > > > Is there some way to accomplish this? > > > > The String-related column types have a "unicode_error" parameter which > sounds like it might be what you want: > > > http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#sqlalchemy.types.String.params.unicode_error > > Note the various warnings around it though... > > Hope that helps, > > Simon > > -- > You received this message because you are subscribed to a topic in the > Google Groups "sqlalchemy" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/sqlalchemy/T--Ftk5EVZg/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at http://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/groups/opt_out.