Hi all, This is a bug in a product with multiple versions in support. The issue can be reproduced with sqlalchemy: 0.6.1, 0.6.6, 0.7.8 cx_Oracle: 5.0.4, 5.1, 5.1.2 Oracle: 10g XE and 10g Enterprise
We have tables created with sqlalchemy.String primary keys and other indexes. When querying through sqlalchemy, when we pass in a unicode value for the key, Oracle does not use the index, and instead uses a full table scan. When executing the exact same query through cx_Oracle, we don't see this issue. The script attached is a small test case which demonstrates the issue. Sample output: Running select Unicode params took 3.455 seconds Encoded params took 0 seconds Running cx_Oracle select Unicode params took 0.0619998 seconds Encoded params took 0 seconds And running the cx_Oracle select first to prevent pollution by caching: Running cx_Oracle select Unicode params took 0.0469999 seconds Encoded params took 0 seconds Running select Unicode params took 3.267 seconds Encoded params took 0.0159998 seconds As can be seen, there's a slight slowdown with the unicode parameter on cx_Oracle, but nothing compared to the slowdown on sqlalchemy. The sqlalchemy unicode parameter select is the only one which performs a full table scan, so that makes sense. This issue has been mentioned on this list before - 15 Feb with title "Full Table scan with Oracle due to Charset conversion" and 4 April with title "Problem with Oracle requests" - but in those cases, the issue was reproducible with the cx_Oracle driver as well, which it is not here. Using the convert_unicode='force' argument to the String types does solve the issue, but there are a few reasons I don't want to do that. This product also supports PostgreSQL, SQL Server and MySQL, and I do not want to make changes which will affect running under those databases. This is also happening in an otherwise very stable product under long-term support in a number of places, so whatever change is made needs to be minimal. Adding an additional function-based index on SYS_OP_C2C(column) for each of the affected columns also solves this issue with a slight performance change for the extra indexes which need to be kept up to date. This is the stopgap solution we've gone for in the meantime for those systems on production. It's not ideal, though, and a bit difficult to automate and then verify the results. We'd prefer to know why this was happening, and maybe fix a sqlalchemy issue in the process. So, my questions are: * What is sqlalchemy doing differently to straight cx_Oracle that triggers this behaviour? How can I debug this further to find out? * Is there a way, on sqlalchemy 0.6.x, to set up a hook which will use convert_unicode='force' only on an Oracle database and only for String columns, automatically across an application? There's no create_engine argument for convert_unicode='force', which is a pity. * And a hook for sqlalchemy 0.7.x to achieve the same result? Thanks in advance for any help, regards, -- David Moore Senior Software Engineer St. James Software Email: dav...@sjsoft.com -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
# -*- coding: utf-8 -*- import sqlalchemy import sys import time import random import cx_Oracle user = "j5" dsn = "XE" def create_id(): return unicode(int(time.time())) + unicode(random.randint(100000, 999999)) def setup_database(metadata): # Create table table = sqlalchemy.Table("test_table", metadata, sqlalchemy.Column("colid", sqlalchemy.String(255), primary_key=True), sqlalchemy.Column("message", sqlalchemy.String(4000))) if not table.exists(): table.create(checkfirst=True) # Add rows for i in range(100000): latest_id = create_id() try: table.insert({"colid":latest_id, "message":"This shouldn't matter "*180}).execute() except Exception as e: print "duplicate error", e else: latest_id = table.select().limit(1).execute().fetchone()['colid'] if isinstance(latest_id, str): latest_id = latest_id.decode('utf-8') return table, latest_id def run_oracle_unicode_params(table, id_): assert table.select().where(table.c.colid == id_).execute().fetchone()[0] == id_ def run_oracle_encoded_params(table, id_): assert table.select().where(table.c.colid == id_.encode("utf-8")).execute().fetchone()[0] == id_ def main(): sa_engine = sqlalchemy.create_engine(oracle_connect_string) metadata = sqlalchemy.MetaData(sa_engine) print "Setting up table" table, id_ = setup_database(metadata) cx_connection = cx_Oracle.connect(user, password, dsn) print "Running cx_Oracle select" start = time.time() cx_cursor = cx_connection.cursor() assert cx_cursor.execute("SELECT colid FROM test_table WHERE colid = :colid", {"colid": id_}).fetchone()[0] == id_ cx_cursor.close() print "Unicode params took %g seconds" % (time.time() - start) start = time.time() cx_cursor = cx_connection.cursor() assert cx_cursor.execute("SELECT colid FROM test_table WHERE colid = :colid", {"colid": id_.encode("utf-8")}).fetchone()[0] == id_ cx_cursor.close() print "Encoded params took %g seconds" % (time.time() - start) cx_connection.close() print "Running select" start = time.time() run_oracle_unicode_params(table, id_) print "Unicode params took %g seconds" % (time.time() - start) start = time.time() run_oracle_encoded_params(table, id_) print "Encoded params took %g seconds" % (time.time() - start) if __name__ == '__main__': global password password = sys.argv[1] global oracle_connect_string oracle_connect_string = "oracle+cx_oracle://%s:%s@%s" % (user, password, dsn) main()