The only unusual thing about the app is the usage of a binary field as a primary key. I'm not even sure postgresql supports bytea as a PK, but that is a very likely candidate for where things are going wrong. If the bytea actually works as a PK, the ORM may be tripping on it. Try building a test that uses plain psycopg2 directly, then another that uses the SQL expression language. Also definitely take out "try: ... except: pass", that's a guaranteed way to add confusion and uncertainty to a test case.
On Dec 29, 2010, at 2:32 PM, ChuckFoo wrote: > Howdy all, > > I'm relatively new to SQLAlchemy, so I hope this isn't too silly of a > question/issue, but I'm having some issues with a TypeDecorator type > I've created and am seeking advice. > > I have a situation where I'd like to automatically convert hex strings > (in this case they are actually MD5 values) into binary equivalents > and store them in a BLOB, so I have this TypeDecorator (shown in the > test code at the bottom of this message) that sort of works. It seems > to do the right thing in sqlite (stores as binary but retrieves as the > hex string) but in postgres (via psychopg2, v2.3.2 and v2.2.2) it > seems to do the right thing on store but does like a double decode on > retrieve. To illustrate, here's the output of my test case: > >> 2010-12-29 14:13:27,246 [DEBUG] Getting session(s) >> 2010-12-29 14:13:27,430 [DEBUG] Populating session sqlite:///sqla_test.db >> 2010-12-29 14:13:27,521 [DEBUG] Populating session postgresql > +psycopg2://postgres:@localhost/my_test >> 2010-12-29 14:13:27,524 [DEBUG] Dumping session(s) >> 2010-12-29 14:13:27,524 [DEBUG] Querying session sqlite:///sqla_test.db >> 2010-12-29 14:13:27,525 [DEBUG] (2 objects) >> 2010-12-29 14:13:27,526 [DEBUG] MyFile md5sum = > 0123456789abcdef0123456789abcdef, filename = foo.exe >> 2010-12-29 14:13:27,526 [DEBUG] MyFile md5sum = > 00000000000000000000000000ff00ff, filename = bar.txt >> 2010-12-29 14:13:27,526 [DEBUG] Querying session postgresql > +psycopg2://postgres:@localhost/my_test >> 2010-12-29 14:13:27,527 [DEBUG] (2 objects) >> 2010-12-29 14:13:27,528 [DEBUG] MyFile md5sum = > 783031323334353637383961626364656630313233343536373839616263646566, > filename = foo.exe >> 2010-12-29 14:13:27,528 [DEBUG] MyFile md5sum = > 783030303030303030303030303030303030303030303030303030666630306666, > filename = bar.txt > > 78 is ASCII for 'x', the other values are the ASCII reps for 0-9a-f. > Selecting directly from the tables I get: > > my_test=# select * from my_files; > md5sum | filename > ------------------------------------+---------- > \x0123456789abcdef0123456789abcdef | foo.exe > \x00000000000000000000000000ff00ff | bar.txt > (2 rows) > > So it is storing it in binary form, AFAICT, and the psql/postgres > bytea handling is converting it back for display, which is probably > happening through psycopg2 I'm guessing, so then I'm getting a double > conversion. Of course, I tried switching to just a LargeBinary type > to see what would happen, and here's the output of that: > >> 2010-12-29 14:21:58,265 [DEBUG] Getting session(s) >> 2010-12-29 14:21:58,500 [DEBUG] Populating session sqlite:///sqla_test.db >> 2010-12-29 14:21:58,589 [DEBUG] Populating session postgresql > +psycopg2://postgres:@localhost/my_test >> 2010-12-29 14:21:58,592 [DEBUG] Dumping session(s) >> 2010-12-29 14:21:58,593 [DEBUG] Querying session sqlite:///sqla_test.db >> 2010-12-29 14:21:58,593 [DEBUG] (2 objects) >> 2010-12-29 14:21:58,594 [DEBUG] MyFile md5sum = > 0123456789abcdef0123456789abcdef, filename = foo.exe >> 2010-12-29 14:21:58,594 [DEBUG] MyFile md5sum = > 00000000000000000000000000ff00ff, filename = bar.txt >> 2010-12-29 14:21:58,594 [DEBUG] Querying session postgresql > +psycopg2://postgres:@localhost/my_test >> 2010-12-29 14:21:58,595 [DEBUG] (2 objects) >> 2010-12-29 14:21:58,596 [DEBUG] MyFile md5sum = > x3031323334353637383961626364656630313233343536373839616263646566, > filename = foo.exe >> 2010-12-29 14:21:58,597 [DEBUG] MyFile md5sum = > x3030303030303030303030303030303030303030303030303030666630306666, > filename = bar.txt > > Notice that sqlite looks identical, but the postgres output is still > encoded (except the 'x' this time). And selecting directly shows that > it's stored in ASCII, not the binary values: > > my_test=# select * from my_files; > md5sum > | filename > -------------------------------------------------------------------- > +---------- > \x3031323334353637383961626364656630313233343536373839616263646566 > | foo.exe > \x3030303030303030303030303030303030303030303030303030666630306666 > | bar.txt > (2 rows) > > It is also stored as ASCII in the sqlite db this way too (so I don't > get a [theoretical] storage savings that I'm looking for, so I'd > prefer to have it stored as binary instead of as an ASCII or Unicode > string). > > So, any advice on how to get this type of storage to behave > identically under sqlite & postgres (and eventually MySQL, probably) > would be greatly appreciated! > > Thanks, > Chuck > > ----->8 test code 8<----- > # > # test case for odd binascii conversion issue > # > > import binascii > > import sqlalchemy as sa > import sqlalchemy.types as sa_types > import sqlalchemy.orm as sa_orm > import sqlalchemy.orm.collections as sa_col > import sqlalchemy.ext.declarative as sa_decl > import sqlalchemy.ext.associationproxy as sa_asprox > > Base = sa_decl.declarative_base() > SessionMap = {} > EngineMap = {} > > # convert MD5 and binary name binascii to/from binary automagically: > class BinAsciiType(sa_types.TypeDecorator): > """Converts between ascii hex char string and BLOB storage > invisibly""" > #impl = types.LargeBinary(8) # might be nice to limit to a > specific size, but doesn't seem to work > impl = sa_types.LargeBinary > def process_bind_param(self, value, dialect): > rc = None > try: > if value is not None: > rc = binascii.a2b_hex(value) > except: > # fail silently if can't convert? > pass > return rc > def process_result_value(self, value, dialect): > rc = None > try: > if value is not None: > rc = binascii.b2a_hex(value) > except: > # fail silently if can't convert? > pass > return rc > def copy(self): > return BinAsciiType(self.impl.length) > > class MyFile(Base): > __tablename__ = 'my_files' > md5sum = sa.Column(BinAsciiType, primary_key=True) > #md5sum = sa.Column(sa_types.LargeBinary, primary_key=True) > filename = sa.Column(sa.String, primary_key=True) > > def __init__(self, md5sum = None, filename = None): > self.md5sum = md5sum > self.filename = filename > > def __str__(self): > return "MyFile md5sum = %s, filename = %s"% > (self.md5sum,self.filename) > > def GetDBSession(db_url, full_init=False): > """Initialize or retrieve a previously initilized > SQLAlchemySessionWrapper object for the given db_url parameter""" > global Base > global SessionMap > global EngineMap > > session = None > engine = None > > if db_url in SessionMap: > session = SessionMap[db_url] > engine = EngineMap[db_url] > else: > engine = sa.create_engine(db_url, echo=False) > session = sa_orm.sessionmaker(bind=engine)() > EngineMap[db_url] = engine > SessionMap[db_url] = session > > if full_init == True: > Base.metadata.drop_all(engine) > # always do the create_all, in case new types were added since > last time... > Base.metadata.create_all(engine) > > return session > > > if __name__ == '__main__': > try: > import os > import sys > import logging > logging.basicConfig(level=logging.DEBUG, > format='%(asctime)s [%(levelname)s] % > (message)s', > stream=sys.stderr) > logger = logging.getLogger('TEST') > > logger.debug("Getting session(s)") > session_list = list() > session_list.append(GetDBSession("sqlite:/// > sqla_test.db",full_init=True)) > session_list.append(GetDBSession("postgresql+psycopg2:// > postgres:@localhost/my_test",full_init=True)) > # populate the session(s): > for session in session_list: > logger.debug("Populating session %s"%session.bind.url) > myfile1 = > MyFile("0123456789abcdef0123456789abcdef","foo.exe") > myfile2 = > MyFile("00000000000000000000000000ff00ff","bar.txt") > session.add(myfile1) > session.add(myfile2) > session.commit() > # dump some data from the session(s): > logger.debug("Dumping session(s)") > for session in session_list: > logger.debug("Querying session %s"%(session.bind.url)) > logger.debug(" (%d objects)"% > (session.query(MyFile).count())) > for obj in session.query(MyFile): > logger.debug(" %s"%(str(obj))) > > except Exception, e: > logger.error("EXCEPTION: %s"%str(e)) > > > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to sqlalch...@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalch...@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.