[sqlalchemy] Full table scan using Oracle String indexes

David Moore Wed, 01 Aug 2012 07:11:39 -0700

Hi all,

This is a bug in a product with multiple versions in support.  The issue can be 
reproduced with
sqlalchemy: 0.6.1, 0.6.6, 0.7.8
cx_Oracle: 5.0.4, 5.1, 5.1.2
Oracle: 10g XE and 10g Enterprise


We have tables created with sqlalchemy.String primary keys and other indexes.  
When querying through sqlalchemy, when we pass in a unicode value for the key, 
Oracle does not use the index, and instead uses a full table scan.  When 
executing the exact same query through cx_Oracle, we don't see this issue.  The 
script attached is a small test case which demonstrates the issue.  Sample 
output:

  Running select
  Unicode params took 3.455 seconds
  Encoded params took 0 seconds
  Running cx_Oracle select
  Unicode params took 0.0619998 seconds
  Encoded params took 0 seconds

And running the cx_Oracle select first to prevent pollution by caching:

  Running cx_Oracle select
  Unicode params took 0.0469999 seconds
  Encoded params took 0 seconds
  Running select
  Unicode params took 3.267 seconds
  Encoded params took 0.0159998 seconds

As can be seen, there's a slight slowdown with the unicode parameter on 
cx_Oracle, but nothing compared to the slowdown on sqlalchemy.  The sqlalchemy 
unicode parameter select is the only one which performs a full table scan, so 
that makes sense.

This issue has been mentioned on this list before - 15 Feb with title "Full 
Table scan with Oracle due to Charset conversion" and 4 April with title 
"Problem with Oracle requests" - but in those cases, the issue was reproducible 
with the cx_Oracle driver as well, which it is not here.

Using the convert_unicode='force' argument to the String types does solve the 
issue, but there are a few reasons I don't want to do that.  This product also 
supports PostgreSQL, SQL Server and MySQL, and I do not want to make changes 
which will affect running under those databases.  This is also happening in an 
otherwise very stable product under long-term support in a number of places, so 
whatever change is made needs to be minimal.

Adding an additional function-based index on SYS_OP_C2C(column) for each of the 
affected columns also solves this issue with a slight performance change for 
the extra indexes which need to be kept up to date.  This is the stopgap 
solution we've gone for in the meantime for those systems on production.  It's 
not ideal, though, and a bit difficult to automate and then verify the results. 
 We'd prefer to know why this was happening, and maybe fix a sqlalchemy issue 
in the process.

So, my questions are:
  * What is sqlalchemy doing differently to straight cx_Oracle that triggers 
this behaviour?  How can I debug this further to find out?
  * Is there a way, on sqlalchemy 0.6.x, to set up a hook which will use 
convert_unicode='force' only on an Oracle database and only for String columns, 
automatically across an application?  There's no create_engine argument for 
convert_unicode='force', which is a pity.
  * And a hook for sqlalchemy 0.7.x to achieve the same result?

Thanks in advance for any help,
regards,
-- 
David Moore
Senior Software Engineer
St. James Software
Email: dav...@sjsoft.com

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

# -*- coding: utf-8 -*-

import sqlalchemy
import sys
import time
import random
import cx_Oracle

user = "j5"
dsn = "XE"

def create_id():
    return unicode(int(time.time())) + unicode(random.randint(100000, 999999))

def setup_database(metadata):
    # Create table
    table = sqlalchemy.Table("test_table", metadata, sqlalchemy.Column("colid", sqlalchemy.String(255), primary_key=True), sqlalchemy.Column("message", sqlalchemy.String(4000)))
    if not table.exists():
        table.create(checkfirst=True)

        # Add rows
        for i in range(100000):
            latest_id = create_id()
            try:
                table.insert({"colid":latest_id, "message":"This shouldn't matter "*180}).execute()
            except Exception as e:
                print "duplicate error", e
    else:
        latest_id = table.select().limit(1).execute().fetchone()['colid']
        if isinstance(latest_id, str):
            latest_id = latest_id.decode('utf-8')
    return table, latest_id

def run_oracle_unicode_params(table, id_):
    assert table.select().where(table.c.colid == id_).execute().fetchone()[0] == id_

def run_oracle_encoded_params(table, id_):
    assert table.select().where(table.c.colid == id_.encode("utf-8")).execute().fetchone()[0] == id_

def main():
    sa_engine = sqlalchemy.create_engine(oracle_connect_string)
    metadata = sqlalchemy.MetaData(sa_engine)
    print "Setting up table"
    table, id_ = setup_database(metadata)
    cx_connection = cx_Oracle.connect(user, password, dsn)
    print "Running cx_Oracle select"
    start = time.time()
    cx_cursor = cx_connection.cursor()
    assert cx_cursor.execute("SELECT colid FROM test_table WHERE colid = :colid", {"colid": id_}).fetchone()[0] == id_
    cx_cursor.close()
    print "Unicode params took %g seconds" % (time.time() - start)
    start = time.time()
    cx_cursor = cx_connection.cursor()
    assert cx_cursor.execute("SELECT colid FROM test_table WHERE colid = :colid", {"colid": id_.encode("utf-8")}).fetchone()[0] == id_
    cx_cursor.close()
    print "Encoded params took %g seconds" % (time.time() - start)
    cx_connection.close()
    print "Running select"
    start = time.time()
    run_oracle_unicode_params(table, id_)
    print "Unicode params took %g seconds" % (time.time() - start)

    start = time.time()
    run_oracle_encoded_params(table, id_)
    print "Encoded params took %g seconds" % (time.time() - start)

if __name__ == '__main__':
    global password
    password = sys.argv[1]
    global oracle_connect_string
    oracle_connect_string = "oracle+cx_oracle://%s:%s@%s" % (user, password, dsn)
    main()

[sqlalchemy] Full table scan using Oracle String indexes

Reply via email to