On 10/21/15 7:27 PM, Uri Okrent wrote:
> Interesting...
> 
> On Wednesday, October 21, 2015 at 5:43:22 PM UTC-4, Michael Bayer wrote:
> 
>     class Customer(Base):
>         __tablename__ = "customer"
>         id = Column(Integer, primary_key=True)
>         name = Column(Unicode(255))
>         description = Column(Unicode(255))
> 
> 
> My declarative classes use Text() for all string columns.  This is
> because I *know* my backend is postgres and that's sort of what they
> recommend (the "tip":
> http://www.postgresql.org/docs/9.3/static/datatype-character.html).
>  However, I neglected to consider the consequences of that decision on
> the ORM.

use UnicodeText() then.  on PG, VARCHAR /CHAR / TEXT are all identical,
it doesn't matter.   Also the only difference between Unicode and
UnicodeText on that platform is what DDL is emitted in CREATE TABLE.  At
the level of DML and queries, the data is all the same across these types.


> 
> It sounds like your recommendation is to disable unicode decoding at the
> engine level with native_unicode=False, and instead explicitly call out
> only those columns that contain unicode for sqlalchemy to handle the
> decoding of only those columns, using a mixture of (in the postgres
> case) Text() and UnicodeText() columns.

If you're really looking to save 15 ms per 100K rows, it seems to have
that effect for now.


> 
> Although, reading the docs, I got the feeling that you were discouraging
> people from using sqlalchemy's built-in decoding facilities in favor of
> the native facilities provided by the dbapi.

Well, originally we did it all ourselves when we didn't have C
extensions, and DBAPIs barely did it.  Then the DBAPIs started supplying
it natively, and especially with the coming of Python 3, they all had
to; compared to SQLAlchemy doing it all in pure Python, there was no
contest.  But then our C extensions came along and sped things up, and
then we started doing things like caching the codec object which is
probably what the DBAPIs aren't doing yet and gained even more speed, so
it seems like we've surpassed the DBAPIs in just this one area, which is
ridiculous because the pure C DBAPIs are always so much faster in every
way, it's quite annoying that this weird incongruity seems to be present.

Under Python 3 we generally don't have an option here, the DBAPIs now
all do unicode encoding automatically.  So the architectures have been
pushed to assume that's the default, but in the case of cx_Oracle and
now apparently psycopg2 we're seeing that in Py2X their unicode
facilities still seem to perform worse than those of SQLAlchemy's.
There's not a clear answer.  I'd prefer if the DBAPIs that are written
in pure C anyway like psycopg2 could just allow their performance to be
faster here, I'd maybe report it to them.



> 
> -- 
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+unsubscr...@googlegroups.com
> <mailto:sqlalchemy+unsubscr...@googlegroups.com>.
> To post to this group, send email to sqlalchemy@googlegroups.com
> <mailto:sqlalchemy@googlegroups.com>.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to