Re: [sqlalchemy] Full table scan using Oracle String indexes

David Moore Thu, 02 Aug 2012 02:42:01 -0700

Hi,

Further searching seems to indicate this is an Oracle issue, not a cx_Oracle 
issue.
http://www.digipedia.pl/usenet/thread/15912/2814/ - same problem with cx_Oracle
http://support.unify.com/supportforum/viewtopic.php?f=40&t=3823 - happening 
with perl drivers.
http://tao.qshine.com/note/note_ora.htm - happening with ADO.NET drivers.


There's also evidence this problem occurs with MS SQL Server as well:
http://www.microsoftfaqs.com/Articles/1148439/Index_not_used_when_UNICODE_%3D_true
 - SQL Server 2005
http://blog.brianhartsock.com/2008/12/14/nvarchar-vs-varchar-in-sql-server-beware/

Although, it seems in SQL Server at least, it still uses the index, just in a 
slower manner, which is probably why we haven't noticed this issue yet.

It seems to me that the place to fix this is when the type of the column is 
known, which is when sqlalchemy binds the parameter.  There's some discussion 
in one of those links that, at the sql statement preparation level, you don't 
know what the column datatype is, so you can't fix it there.

Would it be possible for String datatypes to detect unicode values and encode 
them with the database charset?

As a lower-level temporary solution, you can get cx_Oracle to encode all 
unicode parameters to strings, but that's obviously the wrong thing to do if 
you have any Unicode columns.  This snippet:

def InputTypeHandler(cursor, value, numElements):
    if isinstance(value, unicode):
        return cursor.var(str, arraysize = numElements,
                inconverter = lambda x: x.encode(cursor.connection.nencoding))

And then when creating the connection:

connection.inputtypehandler = InputTypeHandler

thanks for the help,

-- 
David Moore
Senior Software Engineer
St. James Software
Email: dav...@sjsoft.com


----- Original Message -----
> And for anyone else experiencing this issue, there was a subtle
> difference in the execution plans that's now apparent.  The
> statement which only selects colid runs a FAST FULL SCAN and
> successfully converts the unicode parameter using SYS_OP_C2C, and
> uses the index. When you select both colid and message, it runs a
> FULL SCAN, and even though it seems to detect the conversion can be
> done, it does not use the index at all.  Perhaps this is actually an
> Oracle issue?
> 
> --
> David Moore
> Senior Software Engineer
> St. James Software
> Email: dav...@sjsoft.com
> 
> 
> 
> ----- Original Message -----
> > Hi Michael,
> > 
> > Indeed, you are correct - adding the message column to the
> > cx_oracle
> > query shows the same behaviour as the sqlalchemy query.  Sorry I
> > missed that.
> > 
> > I will take these results to the cx_oracle list on this basis.  On
> > my
> > second question, though, is there a way to enforce
> > convert_unicode='force' only on Oracle databases across an
> > application?  I know the hooking syntax changed from sqlalchemy 0.6
> > to 0.7, so I'd imagine if there is a way, it would be different for
> > those two versions.
> > 
> > thanks,
> > 
> > --
> > David Moore
> > Senior Software Engineer
> > St. James Software
> > Email: dav...@sjsoft.com
> > 
> > 
> > ----- Original Message -----
> > > Hi David -
> > > 
> > > I've done some experiments with the script you gave me.  Correct
> > > me
> > > if I'm wrong (which is very possible), but the attached revised
> > > version appears to demonstrate the difference between the
> > > cx_oracle
> > > and SQLAlchemy versions is really just that the SQLAlchemy
> > > version
> > > is running a different SQL string:
> > > 
> > >   SELECT test_table.colid, test_table.message
> > >   FROM test_table
> > >   WHERE test_table.colid = :colid_1
> > > 
> > > whereas the cx_oracle version, critically, does not ask for the
> > > "message" column:
> > > 
> > >   SELECT test_table.colid FROM test_table WHERE test_table.colid =
> > >   :colid
> > > 
> > > it's not clear to me if the placement of the "message" column
> > > impacts
> > > the Oracle planner, or otherwise if the latency is on the
> > > cx_oracle
> > > side regarding buffering of columns or something similar.     But
> > > i
> > > am able to get the SQLAlchemy version to be as fast as the
> > > cx_oracle
> > > version, and the cx_oracle version to be as slow as the
> > > sqlalchemy
> > > version, by just adding/removing the "message" column from the
> > > query.
> > > 
> > > The attached script also tries to eliminate the overhead of
> > > SQLAlchemy compiling the SQL construct within the timed portion,
> > > as
> > > I was testing this against only 40K rows and wanted to try to get
> > > the results as close as possible.  It's only dealing with the
> > > unicode parameter, and it's true if you encode the parameter
> > > first,
> > > both the SQLA and cx_oracle versions get faster, even to the
> > > point
> > > that the two-column version minus unicode is faster than the one
> > > column version with unicode.   My guess is that cx_oracle is
> > > detecting a unicode value in the input and doing something
> > > different
> > > with the statement overall as a result of it being present.
> > > 
> > > Some typical set of results are:
> > > 
> > > Setting up table
> > > sa_both_cols 1.20209717751
> > > sa_one_col 0.0182020664215
> > > sa_literal_sql 0.019690990448
> > > cx_oracle_one_col 0.0839619636536
> > > cx_oracle_both_cols 1.18438816071
> > > 
> > > Setting up table
> > > sa_both_cols 1.17753505707
> > > sa_one_col 0.0785720348358
> > > sa_literal_sql 0.016618013382
> > > cx_oracle_one_col 0.0194280147552
> > > cx_oracle_both_cols 1.15302705765
> > > 
> > > while the cx_oracle/SQLA versions seem to compete for the "fast"
> > > version, which is because we're only running the statement once
> > > and
> > > I'm running on a low performing amazon small instance, it's clear
> > > the "two column" version is where the vast amount of latency
> > > occurs
> > > for both systems.
> > > 
> > > 
> > > let me know if you can confirm similar results on your end.  I
> > > tested
> > > only with SQLAlchemy 0.8 in trunk but I was able to reproduce
> > > your
> > > initial results so I assume the experience with 0.7, 0.6 would be
> > > similar.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > On Aug 1, 2012, at 6:40 AM, David Moore wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > This is a bug in a product with multiple versions in support.
> > > >  The
> > > > issue can be reproduced with
> > > > sqlalchemy: 0.6.1, 0.6.6, 0.7.8
> > > > cx_Oracle: 5.0.4, 5.1, 5.1.2
> > > > Oracle: 10g XE and 10g Enterprise
> > > > 
> > > > We have tables created with sqlalchemy.String primary keys and
> > > > other indexes.  When querying through sqlalchemy, when we pass
> > > > in
> > > > a unicode value for the key, Oracle does not use the index, and
> > > > instead uses a full table scan.  When executing the exact same
> > > > query through cx_Oracle, we don't see this issue.  The script
> > > > attached is a small test case which demonstrates the issue.
> > > >  Sample output:
> > > > 
> > > >  Running select
> > > >  Unicode params took 3.455 seconds
> > > >  Encoded params took 0 seconds
> > > >  Running cx_Oracle select
> > > >  Unicode params took 0.0619998 seconds
> > > >  Encoded params took 0 seconds
> > > > 
> > > > And running the cx_Oracle select first to prevent pollution by
> > > > caching:
> > > > 
> > > >  Running cx_Oracle select
> > > >  Unicode params took 0.0469999 seconds
> > > >  Encoded params took 0 seconds
> > > >  Running select
> > > >  Unicode params took 3.267 seconds
> > > >  Encoded params took 0.0159998 seconds
> > > > 
> > > > As can be seen, there's a slight slowdown with the unicode
> > > > parameter on cx_Oracle, but nothing compared to the slowdown on
> > > > sqlalchemy.  The sqlalchemy unicode parameter select is the
> > > > only
> > > > one which performs a full table scan, so that makes sense.
> > > > 
> > > > This issue has been mentioned on this list before - 15 Feb with
> > > > title "Full Table scan with Oracle due to Charset conversion"
> > > > and
> > > > 4 April with title "Problem with Oracle requests" - but in
> > > > those
> > > > cases, the issue was reproducible with the cx_Oracle driver as
> > > > well, which it is not here.
> > > > 
> > > > Using the convert_unicode='force' argument to the String types
> > > > does
> > > > solve the issue, but there are a few reasons I don't want to do
> > > > that.  This product also supports PostgreSQL, SQL Server and
> > > > MySQL, and I do not want to make changes which will affect
> > > > running
> > > > under those databases.  This is also happening in an otherwise
> > > > very stable product under long-term support in a number of
> > > > places,
> > > > so whatever change is made needs to be minimal.
> > > > 
> > > > Adding an additional function-based index on SYS_OP_C2C(column)
> > > > for
> > > > each of the affected columns also solves this issue with a
> > > > slight
> > > > performance change for the extra indexes which need to be kept
> > > > up
> > > > to date.  This is the stopgap solution we've gone for in the
> > > > meantime for those systems on production.  It's not ideal,
> > > > though,
> > > > and a bit difficult to automate and then verify the results.
> > > >  We'd
> > > > prefer to know why this was happening, and maybe fix a
> > > > sqlalchemy
> > > > issue in the process.
> > > > 
> > > > So, my questions are:
> > > >  * What is sqlalchemy doing differently to straight cx_Oracle
> > > >  that
> > > >  triggers this behaviour?  How can I debug this further to find
> > > >  out?
> > > >  * Is there a way, on sqlalchemy 0.6.x, to set up a hook which
> > > >  will
> > > >  use convert_unicode='force' only on an Oracle database and
> > > >  only
> > > >  for String columns, automatically across an application?
> > > >   There's
> > > >  no create_engine argument for convert_unicode='force', which
> > > >  is
> > > >  a
> > > >  pity.
> > > >  * And a hook for sqlalchemy 0.7.x to achieve the same result?
> > > > 
> > > > Thanks in advance for any help,
> > > > regards,
> > > > --
> > > > David Moore
> > > > Senior Software Engineer
> > > > St. James Software
> > > > Email: dav...@sjsoft.com
> > > > 
> > > > --
> > > > You received this message because you are subscribed to the
> > > > Google
> > > > Groups "sqlalchemy" group.
> > > > To post to this group, send email to
> > > > sqlalchemy@googlegroups.com.
> > > > To unsubscribe from this group, send email to
> > > > sqlalchemy+unsubscr...@googlegroups.com.
> > > > For more options, visit this group at
> > > > http://groups.google.com/group/sqlalchemy?hl=en.
> > > > 
> > > > <oracle_sqlalchemy_encoding_error.py>
> > > 
> > > 
> > > --
> > > You received this message because you are subscribed to the
> > > Google
> > > Groups "sqlalchemy" group.
> > > To post to this group, send email to sqlalchemy@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > sqlalchemy+unsubscr...@googlegroups.com.
> > > For more options, visit this group at
> > > http://groups.google.com/group/sqlalchemy?hl=en.
> > > 
> > > 
> > 
> > --
> > You received this message because you are subscribed to the Google
> > Groups "sqlalchemy" group.
> > To post to this group, send email to sqlalchemy@googlegroups.com.
> > To unsubscribe from this group, send email to
> > sqlalchemy+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> > http://groups.google.com/group/sqlalchemy?hl=en.
> > 
> > 
> 
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Full table scan using Oracle String indexes

Reply via email to