Re: [sqlalchemy] Full table scan using Oracle String indexes

Michael Bayer Thu, 02 Aug 2012 14:22:20 -0700

so the "convert_unicode='force'" flag is not ideal here as that will spend most 
of it's time checking for decodes necessary on the result set side, which is a 
lot of wasted effort.  You can customize how strings are handled on the bind 
side, including per-dialect behavior, using a custom type:


class MyStringType(TypeDecorator):
    impl = String

    def process_bind_param(self, value, dialect):
        if value is not None and dialect.name == "oracle":
            value = value.encode('utf-8')

then replace usage of the String(length=XYZ) type with MyStringType(length=XYZ).

docs:  http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#types-custom


On Aug 2, 2012, at 5:41 AM, David Moore wrote:

> Hi,
> 
> Further searching seems to indicate this is an Oracle issue, not a cx_Oracle 
> issue.
> http://www.digipedia.pl/usenet/thread/15912/2814/ - same problem with 
> cx_Oracle
> http://support.unify.com/supportforum/viewtopic.php?f=40&t=3823 - happening 
> with perl drivers.
> http://tao.qshine.com/note/note_ora.htm - happening with ADO.NET drivers.
> 
> There's also evidence this problem occurs with MS SQL Server as well:
> http://www.microsoftfaqs.com/Articles/1148439/Index_not_used_when_UNICODE_%3D_true
>  - SQL Server 2005
> http://blog.brianhartsock.com/2008/12/14/nvarchar-vs-varchar-in-sql-server-beware/
> 
> Although, it seems in SQL Server at least, it still uses the index, just in a 
> slower manner, which is probably why we haven't noticed this issue yet.
> 
> It seems to me that the place to fix this is when the type of the column is 
> known, which is when sqlalchemy binds the parameter.  There's some discussion 
> in one of those links that, at the sql statement preparation level, you don't 
> know what the column datatype is, so you can't fix it there.
> 
> Would it be possible for String datatypes to detect unicode values and encode 
> them with the database charset?
> 
> As a lower-level temporary solution, you can get cx_Oracle to encode all 
> unicode parameters to strings, but that's obviously the wrong thing to do if 
> you have any Unicode columns.  This snippet:
> 
> def InputTypeHandler(cursor, value, numElements):
>    if isinstance(value, unicode):
>        return cursor.var(str, arraysize = numElements,
>                inconverter = lambda x: x.encode(cursor.connection.nencoding))
> 
> And then when creating the connection:
> 
> connection.inputtypehandler = InputTypeHandler
> 
> thanks for the help,
> 
> -- 
> David Moore
> Senior Software Engineer
> St. James Software
> Email: dav...@sjsoft.com
> 
> 
> ----- Original Message -----
>> And for anyone else experiencing this issue, there was a subtle
>> difference in the execution plans that's now apparent.  The
>> statement which only selects colid runs a FAST FULL SCAN and
>> successfully converts the unicode parameter using SYS_OP_C2C, and
>> uses the index. When you select both colid and message, it runs a
>> FULL SCAN, and even though it seems to detect the conversion can be
>> done, it does not use the index at all.  Perhaps this is actually an
>> Oracle issue?
>> 
>> --
>> David Moore
>> Senior Software Engineer
>> St. James Software
>> Email: dav...@sjsoft.com
>> 
>> 
>> 
>> ----- Original Message -----
>>> Hi Michael,
>>> 
>>> Indeed, you are correct - adding the message column to the
>>> cx_oracle
>>> query shows the same behaviour as the sqlalchemy query.  Sorry I
>>> missed that.
>>> 
>>> I will take these results to the cx_oracle list on this basis.  On
>>> my
>>> second question, though, is there a way to enforce
>>> convert_unicode='force' only on Oracle databases across an
>>> application?  I know the hooking syntax changed from sqlalchemy 0.6
>>> to 0.7, so I'd imagine if there is a way, it would be different for
>>> those two versions.
>>> 
>>> thanks,
>>> 
>>> --
>>> David Moore
>>> Senior Software Engineer
>>> St. James Software
>>> Email: dav...@sjsoft.com
>>> 
>>> 
>>> ----- Original Message -----
>>>> Hi David -
>>>> 
>>>> I've done some experiments with the script you gave me.  Correct
>>>> me
>>>> if I'm wrong (which is very possible), but the attached revised
>>>> version appears to demonstrate the difference between the
>>>> cx_oracle
>>>> and SQLAlchemy versions is really just that the SQLAlchemy
>>>> version
>>>> is running a different SQL string:
>>>> 
>>>>    SELECT test_table.colid, test_table.message
>>>>    FROM test_table
>>>>    WHERE test_table.colid = :colid_1
>>>> 
>>>> whereas the cx_oracle version, critically, does not ask for the
>>>> "message" column:
>>>> 
>>>>    SELECT test_table.colid FROM test_table WHERE test_table.colid =
>>>>    :colid
>>>> 
>>>> it's not clear to me if the placement of the "message" column
>>>> impacts
>>>> the Oracle planner, or otherwise if the latency is on the
>>>> cx_oracle
>>>> side regarding buffering of columns or something similar.     But
>>>> i
>>>> am able to get the SQLAlchemy version to be as fast as the
>>>> cx_oracle
>>>> version, and the cx_oracle version to be as slow as the
>>>> sqlalchemy
>>>> version, by just adding/removing the "message" column from the
>>>> query.
>>>> 
>>>> The attached script also tries to eliminate the overhead of
>>>> SQLAlchemy compiling the SQL construct within the timed portion,
>>>> as
>>>> I was testing this against only 40K rows and wanted to try to get
>>>> the results as close as possible.  It's only dealing with the
>>>> unicode parameter, and it's true if you encode the parameter
>>>> first,
>>>> both the SQLA and cx_oracle versions get faster, even to the
>>>> point
>>>> that the two-column version minus unicode is faster than the one
>>>> column version with unicode.   My guess is that cx_oracle is
>>>> detecting a unicode value in the input and doing something
>>>> different
>>>> with the statement overall as a result of it being present.
>>>> 
>>>> Some typical set of results are:
>>>> 
>>>> Setting up table
>>>> sa_both_cols 1.20209717751
>>>> sa_one_col 0.0182020664215
>>>> sa_literal_sql 0.019690990448
>>>> cx_oracle_one_col 0.0839619636536
>>>> cx_oracle_both_cols 1.18438816071
>>>> 
>>>> Setting up table
>>>> sa_both_cols 1.17753505707
>>>> sa_one_col 0.0785720348358
>>>> sa_literal_sql 0.016618013382
>>>> cx_oracle_one_col 0.0194280147552
>>>> cx_oracle_both_cols 1.15302705765
>>>> 
>>>> while the cx_oracle/SQLA versions seem to compete for the "fast"
>>>> version, which is because we're only running the statement once
>>>> and
>>>> I'm running on a low performing amazon small instance, it's clear
>>>> the "two column" version is where the vast amount of latency
>>>> occurs
>>>> for both systems.
>>>> 
>>>> 
>>>> let me know if you can confirm similar results on your end.  I
>>>> tested
>>>> only with SQLAlchemy 0.8 in trunk but I was able to reproduce
>>>> your
>>>> initial results so I assume the experience with 0.7, 0.6 would be
>>>> similar.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Aug 1, 2012, at 6:40 AM, David Moore wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> This is a bug in a product with multiple versions in support.
>>>>> The
>>>>> issue can be reproduced with
>>>>> sqlalchemy: 0.6.1, 0.6.6, 0.7.8
>>>>> cx_Oracle: 5.0.4, 5.1, 5.1.2
>>>>> Oracle: 10g XE and 10g Enterprise
>>>>> 
>>>>> We have tables created with sqlalchemy.String primary keys and
>>>>> other indexes.  When querying through sqlalchemy, when we pass
>>>>> in
>>>>> a unicode value for the key, Oracle does not use the index, and
>>>>> instead uses a full table scan.  When executing the exact same
>>>>> query through cx_Oracle, we don't see this issue.  The script
>>>>> attached is a small test case which demonstrates the issue.
>>>>> Sample output:
>>>>> 
>>>>> Running select
>>>>> Unicode params took 3.455 seconds
>>>>> Encoded params took 0 seconds
>>>>> Running cx_Oracle select
>>>>> Unicode params took 0.0619998 seconds
>>>>> Encoded params took 0 seconds
>>>>> 
>>>>> And running the cx_Oracle select first to prevent pollution by
>>>>> caching:
>>>>> 
>>>>> Running cx_Oracle select
>>>>> Unicode params took 0.0469999 seconds
>>>>> Encoded params took 0 seconds
>>>>> Running select
>>>>> Unicode params took 3.267 seconds
>>>>> Encoded params took 0.0159998 seconds
>>>>> 
>>>>> As can be seen, there's a slight slowdown with the unicode
>>>>> parameter on cx_Oracle, but nothing compared to the slowdown on
>>>>> sqlalchemy.  The sqlalchemy unicode parameter select is the
>>>>> only
>>>>> one which performs a full table scan, so that makes sense.
>>>>> 
>>>>> This issue has been mentioned on this list before - 15 Feb with
>>>>> title "Full Table scan with Oracle due to Charset conversion"
>>>>> and
>>>>> 4 April with title "Problem with Oracle requests" - but in
>>>>> those
>>>>> cases, the issue was reproducible with the cx_Oracle driver as
>>>>> well, which it is not here.
>>>>> 
>>>>> Using the convert_unicode='force' argument to the String types
>>>>> does
>>>>> solve the issue, but there are a few reasons I don't want to do
>>>>> that.  This product also supports PostgreSQL, SQL Server and
>>>>> MySQL, and I do not want to make changes which will affect
>>>>> running
>>>>> under those databases.  This is also happening in an otherwise
>>>>> very stable product under long-term support in a number of
>>>>> places,
>>>>> so whatever change is made needs to be minimal.
>>>>> 
>>>>> Adding an additional function-based index on SYS_OP_C2C(column)
>>>>> for
>>>>> each of the affected columns also solves this issue with a
>>>>> slight
>>>>> performance change for the extra indexes which need to be kept
>>>>> up
>>>>> to date.  This is the stopgap solution we've gone for in the
>>>>> meantime for those systems on production.  It's not ideal,
>>>>> though,
>>>>> and a bit difficult to automate and then verify the results.
>>>>> We'd
>>>>> prefer to know why this was happening, and maybe fix a
>>>>> sqlalchemy
>>>>> issue in the process.
>>>>> 
>>>>> So, my questions are:
>>>>> * What is sqlalchemy doing differently to straight cx_Oracle
>>>>> that
>>>>> triggers this behaviour?  How can I debug this further to find
>>>>> out?
>>>>> * Is there a way, on sqlalchemy 0.6.x, to set up a hook which
>>>>> will
>>>>> use convert_unicode='force' only on an Oracle database and
>>>>> only
>>>>> for String columns, automatically across an application?
>>>>>  There's
>>>>> no create_engine argument for convert_unicode='force', which
>>>>> is
>>>>> a
>>>>> pity.
>>>>> * And a hook for sqlalchemy 0.7.x to achieve the same result?
>>>>> 
>>>>> Thanks in advance for any help,
>>>>> regards,
>>>>> --
>>>>> David Moore
>>>>> Senior Software Engineer
>>>>> St. James Software
>>>>> Email: dav...@sjsoft.com
>>>>> 
>>>>> --
>>>>> You received this message because you are subscribed to the
>>>>> Google
>>>>> Groups "sqlalchemy" group.
>>>>> To post to this group, send email to
>>>>> sqlalchemy@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> sqlalchemy+unsubscr...@googlegroups.com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/sqlalchemy?hl=en.
>>>>> 
>>>>> <oracle_sqlalchemy_encoding_error.py>
>>>> 
>>>> 
>>>> --
>>>> You received this message because you are subscribed to the
>>>> Google
>>>> Groups "sqlalchemy" group.
>>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>>> To unsubscribe from this group, send email to
>>>> sqlalchemy+unsubscr...@googlegroups.com.
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/sqlalchemy?hl=en.
>>>> 
>>>> 
>>> 
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "sqlalchemy" group.
>>> To post to this group, send email to sqlalchemy@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> sqlalchemy+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/sqlalchemy?hl=en.
>>> 
>>> 
>> 
>> --
>> You received this message because you are subscribed to the Google
>> Groups "sqlalchemy" group.
>> To post to this group, send email to sqlalchemy@googlegroups.com.
>> To unsubscribe from this group, send email to
>> sqlalchemy+unsubscr...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/sqlalchemy?hl=en.
>> 
>> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to 
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Full table scan using Oracle String indexes

Reply via email to