Re: [sqlalchemy] Full table scan using Oracle String indexes

Yann Gravrand Thu, 25 Oct 2012 07:09:24 -0700

Hi,

Having had the same problem as above, I would like to use the 
"convert_unicode='force'" flag at engine configuration level.
It seems that String tests the value of "convert_unicode" at its own level 
as well as engine level, but it isn't the case for the 'force' value check :


    def bind_processor(self, dialect):
        if self.convert_unicode or dialect.convert_unicode:
            if dialect.supports_unicode_binds and \
                *self.convert_unicode != 'force'*:
                ...

    def result_processor(self, dialect, coltype):
        wants_unicode = self.convert_unicode or dialect.convert_unicode
        needs_convert = wants_unicode and \
                        (dialect.returns_unicode_strings is not True or
                        *self.convert_unicode == 'force'*)

Thus I have to create a type decorator for all my Strings even if I have 
set the right flag at engine configuration level...

Yann


Le jeudi 2 août 2012 23:22:22 UTC+2, Michael Bayer a écrit :
>
> so the "convert_unicode='force'" flag is not ideal here as that will spend 
> most of it's time checking for decodes necessary on the result set side, 
> which is a lot of wasted effort.  You can customize how strings are handled 
> on the bind side, including per-dialect behavior, using a custom type: 
>
> class MyStringType(TypeDecorator): 
>     impl = String 
>
>     def process_bind_param(self, value, dialect): 
>         if value is not None and dialect.name == "oracle": 
>             value = value.encode('utf-8') 
>
> then replace usage of the String(length=XYZ) type with 
> MyStringType(length=XYZ). 
>
> docs:  http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#types-custom 
>
>
> On Aug 2, 2012, at 5:41 AM, David Moore wrote: 
>
> > Hi, 
> > 
> > Further searching seems to indicate this is an Oracle issue, not a 
> cx_Oracle issue. 
> > http://www.digipedia.pl/usenet/thread/15912/2814/ - same problem with 
> cx_Oracle 
> > http://support.unify.com/supportforum/viewtopic.php?f=40&t=3823 - 
> happening with perl drivers. 
> > http://tao.qshine.com/note/note_ora.htm - happening with ADO.NETdrivers. 
> > 
> > There's also evidence this problem occurs with MS SQL Server as well: 
> > 
> http://www.microsoftfaqs.com/Articles/1148439/Index_not_used_when_UNICODE_%3D_true-
>  SQL Server 2005 
> > 
> http://blog.brianhartsock.com/2008/12/14/nvarchar-vs-varchar-in-sql-server-beware/
>  
> > 
> > Although, it seems in SQL Server at least, it still uses the index, just 
> in a slower manner, which is probably why we haven't noticed this issue 
> yet. 
> > 
> > It seems to me that the place to fix this is when the type of the column 
> is known, which is when sqlalchemy binds the parameter.  There's some 
> discussion in one of those links that, at the sql statement preparation 
> level, you don't know what the column datatype is, so you can't fix it 
> there. 
> > 
> > Would it be possible for String datatypes to detect unicode values and 
> encode them with the database charset? 
> > 
> > As a lower-level temporary solution, you can get cx_Oracle to encode all 
> unicode parameters to strings, but that's obviously the wrong thing to do 
> if you have any Unicode columns.  This snippet: 
> > 
> > def InputTypeHandler(cursor, value, numElements): 
> >    if isinstance(value, unicode): 
> >        return cursor.var(str, arraysize = numElements, 
> >                inconverter = lambda x: 
> x.encode(cursor.connection.nencoding)) 
> > 
> > And then when creating the connection: 
> > 
> > connection.inputtypehandler = InputTypeHandler 
> > 
> > thanks for the help, 
> > 
> > -- 
> > David Moore 
> > Senior Software Engineer 
> > St. James Software 
> > Email: dav...@sjsoft.com <javascript:> 
> > 
> > 
> > ----- Original Message ----- 
> >> And for anyone else experiencing this issue, there was a subtle 
> >> difference in the execution plans that's now apparent.  The 
> >> statement which only selects colid runs a FAST FULL SCAN and 
> >> successfully converts the unicode parameter using SYS_OP_C2C, and 
> >> uses the index. When you select both colid and message, it runs a 
> >> FULL SCAN, and even though it seems to detect the conversion can be 
> >> done, it does not use the index at all.  Perhaps this is actually an 
> >> Oracle issue? 
> >> 
> >> -- 
> >> David Moore 
> >> Senior Software Engineer 
> >> St. James Software 
> >> Email: dav...@sjsoft.com <javascript:> 
> >> 
> >> 
> >> 
> >> ----- Original Message ----- 
> >>> Hi Michael, 
> >>> 
> >>> Indeed, you are correct - adding the message column to the 
> >>> cx_oracle 
> >>> query shows the same behaviour as the sqlalchemy query.  Sorry I 
> >>> missed that. 
> >>> 
> >>> I will take these results to the cx_oracle list on this basis.  On 
> >>> my 
> >>> second question, though, is there a way to enforce 
> >>> convert_unicode='force' only on Oracle databases across an 
> >>> application?  I know the hooking syntax changed from sqlalchemy 0.6 
> >>> to 0.7, so I'd imagine if there is a way, it would be different for 
> >>> those two versions. 
> >>> 
> >>> thanks, 
> >>> 
> >>> -- 
> >>> David Moore 
> >>> Senior Software Engineer 
> >>> St. James Software 
> >>> Email: dav...@sjsoft.com <javascript:> 
> >>> 
> >>> 
> >>> ----- Original Message ----- 
> >>>> Hi David - 
> >>>> 
> >>>> I've done some experiments with the script you gave me.  Correct 
> >>>> me 
> >>>> if I'm wrong (which is very possible), but the attached revised 
> >>>> version appears to demonstrate the difference between the 
> >>>> cx_oracle 
> >>>> and SQLAlchemy versions is really just that the SQLAlchemy 
> >>>> version 
> >>>> is running a different SQL string: 
> >>>> 
> >>>>         SELECT test_table.colid, test_table.message 
> >>>>         FROM test_table 
> >>>>         WHERE test_table.colid = :colid_1 
> >>>> 
> >>>> whereas the cx_oracle version, critically, does not ask for the 
> >>>> "message" column: 
> >>>> 
> >>>>         SELECT test_table.colid FROM test_table WHERE 
> test_table.colid = 
> >>>>         :colid 
> >>>> 
> >>>> it's not clear to me if the placement of the "message" column 
> >>>> impacts 
> >>>> the Oracle planner, or otherwise if the latency is on the 
> >>>> cx_oracle 
> >>>> side regarding buffering of columns or something similar.     But 
> >>>> i 
> >>>> am able to get the SQLAlchemy version to be as fast as the 
> >>>> cx_oracle 
> >>>> version, and the cx_oracle version to be as slow as the 
> >>>> sqlalchemy 
> >>>> version, by just adding/removing the "message" column from the 
> >>>> query. 
> >>>> 
> >>>> The attached script also tries to eliminate the overhead of 
> >>>> SQLAlchemy compiling the SQL construct within the timed portion, 
> >>>> as 
> >>>> I was testing this against only 40K rows and wanted to try to get 
> >>>> the results as close as possible.  It's only dealing with the 
> >>>> unicode parameter, and it's true if you encode the parameter 
> >>>> first, 
> >>>> both the SQLA and cx_oracle versions get faster, even to the 
> >>>> point 
> >>>> that the two-column version minus unicode is faster than the one 
> >>>> column version with unicode.   My guess is that cx_oracle is 
> >>>> detecting a unicode value in the input and doing something 
> >>>> different 
> >>>> with the statement overall as a result of it being present. 
> >>>> 
> >>>> Some typical set of results are: 
> >>>> 
> >>>> Setting up table 
> >>>> sa_both_cols 1.20209717751 
> >>>> sa_one_col 0.0182020664215 
> >>>> sa_literal_sql 0.019690990448 
> >>>> cx_oracle_one_col 0.0839619636536 
> >>>> cx_oracle_both_cols 1.18438816071 
> >>>> 
> >>>> Setting up table 
> >>>> sa_both_cols 1.17753505707 
> >>>> sa_one_col 0.0785720348358 
> >>>> sa_literal_sql 0.016618013382 
> >>>> cx_oracle_one_col 0.0194280147552 
> >>>> cx_oracle_both_cols 1.15302705765 
> >>>> 
> >>>> while the cx_oracle/SQLA versions seem to compete for the "fast" 
> >>>> version, which is because we're only running the statement once 
> >>>> and 
> >>>> I'm running on a low performing amazon small instance, it's clear 
> >>>> the "two column" version is where the vast amount of latency 
> >>>> occurs 
> >>>> for both systems. 
> >>>> 
> >>>> 
> >>>> let me know if you can confirm similar results on your end.  I 
> >>>> tested 
> >>>> only with SQLAlchemy 0.8 in trunk but I was able to reproduce 
> >>>> your 
> >>>> initial results so I assume the experience with 0.7, 0.6 would be 
> >>>> similar. 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> On Aug 1, 2012, at 6:40 AM, David Moore wrote: 
> >>>> 
> >>>>> Hi all, 
> >>>>> 
> >>>>> This is a bug in a product with multiple versions in support. 
> >>>>> The 
> >>>>> issue can be reproduced with 
> >>>>> sqlalchemy: 0.6.1, 0.6.6, 0.7.8 
> >>>>> cx_Oracle: 5.0.4, 5.1, 5.1.2 
> >>>>> Oracle: 10g XE and 10g Enterprise 
> >>>>> 
> >>>>> We have tables created with sqlalchemy.String primary keys and 
> >>>>> other indexes.  When querying through sqlalchemy, when we pass 
> >>>>> in 
> >>>>> a unicode value for the key, Oracle does not use the index, and 
> >>>>> instead uses a full table scan.  When executing the exact same 
> >>>>> query through cx_Oracle, we don't see this issue.  The script 
> >>>>> attached is a small test case which demonstrates the issue. 
> >>>>> Sample output: 
> >>>>> 
> >>>>> Running select 
> >>>>> Unicode params took 3.455 seconds 
> >>>>> Encoded params took 0 seconds 
> >>>>> Running cx_Oracle select 
> >>>>> Unicode params took 0.0619998 seconds 
> >>>>> Encoded params took 0 seconds 
> >>>>> 
> >>>>> And running the cx_Oracle select first to prevent pollution by 
> >>>>> caching: 
> >>>>> 
> >>>>> Running cx_Oracle select 
> >>>>> Unicode params took 0.0469999 seconds 
> >>>>> Encoded params took 0 seconds 
> >>>>> Running select 
> >>>>> Unicode params took 3.267 seconds 
> >>>>> Encoded params took 0.0159998 seconds 
> >>>>> 
> >>>>> As can be seen, there's a slight slowdown with the unicode 
> >>>>> parameter on cx_Oracle, but nothing compared to the slowdown on 
> >>>>> sqlalchemy.  The sqlalchemy unicode parameter select is the 
> >>>>> only 
> >>>>> one which performs a full table scan, so that makes sense. 
> >>>>> 
> >>>>> This issue has been mentioned on this list before - 15 Feb with 
> >>>>> title "Full Table scan with Oracle due to Charset conversion" 
> >>>>> and 
> >>>>> 4 April with title "Problem with Oracle requests" - but in 
> >>>>> those 
> >>>>> cases, the issue was reproducible with the cx_Oracle driver as 
> >>>>> well, which it is not here. 
> >>>>> 
> >>>>> Using the convert_unicode='force' argument to the String types 
> >>>>> does 
> >>>>> solve the issue, but there are a few reasons I don't want to do 
> >>>>> that.  This product also supports PostgreSQL, SQL Server and 
> >>>>> MySQL, and I do not want to make changes which will affect 
> >>>>> running 
> >>>>> under those databases.  This is also happening in an otherwise 
> >>>>> very stable product under long-term support in a number of 
> >>>>> places, 
> >>>>> so whatever change is made needs to be minimal. 
> >>>>> 
> >>>>> Adding an additional function-based index on SYS_OP_C2C(column) 
> >>>>> for 
> >>>>> each of the affected columns also solves this issue with a 
> >>>>> slight 
> >>>>> performance change for the extra indexes which need to be kept 
> >>>>> up 
> >>>>> to date.  This is the stopgap solution we've gone for in the 
> >>>>> meantime for those systems on production.  It's not ideal, 
> >>>>> though, 
> >>>>> and a bit difficult to automate and then verify the results. 
> >>>>> We'd 
> >>>>> prefer to know why this was happening, and maybe fix a 
> >>>>> sqlalchemy 
> >>>>> issue in the process. 
> >>>>> 
> >>>>> So, my questions are: 
> >>>>> * What is sqlalchemy doing differently to straight cx_Oracle 
> >>>>> that 
> >>>>> triggers this behaviour?  How can I debug this further to find 
> >>>>> out? 
> >>>>> * Is there a way, on sqlalchemy 0.6.x, to set up a hook which 
> >>>>> will 
> >>>>> use convert_unicode='force' only on an Oracle database and 
> >>>>> only 
> >>>>> for String columns, automatically across an application? 
> >>>>>  There's 
> >>>>> no create_engine argument for convert_unicode='force', which 
> >>>>> is 
> >>>>> a 
> >>>>> pity. 
> >>>>> * And a hook for sqlalchemy 0.7.x to achieve the same result? 
> >>>>> 
> >>>>> Thanks in advance for any help, 
> >>>>> regards, 
> >>>>> -- 
> >>>>> David Moore 
> >>>>> Senior Software Engineer 
> >>>>> St. James Software 
> >>>>> Email: dav...@sjsoft.com <javascript:> 
> >>>>> 
> >>>>> -- 
> >>>>> You received this message because you are subscribed to the 
> >>>>> Google 
> >>>>> Groups "sqlalchemy" group. 
> >>>>> To post to this group, send email to 
> >>>>> sqlal...@googlegroups.com <javascript:>. 
> >>>>> To unsubscribe from this group, send email to 
> >>>>> sqlalchemy+...@googlegroups.com <javascript:>. 
> >>>>> For more options, visit this group at 
> >>>>> http://groups.google.com/group/sqlalchemy?hl=en. 
> >>>>> 
> >>>>> <oracle_sqlalchemy_encoding_error.py> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> You received this message because you are subscribed to the 
> >>>> Google 
> >>>> Groups "sqlalchemy" group. 
> >>>> To post to this group, send email to 
> >>>> sqlal...@googlegroups.com<javascript:>. 
>
> >>>> To unsubscribe from this group, send email to 
> >>>> sqlalchemy+...@googlegroups.com <javascript:>. 
> >>>> For more options, visit this group at 
> >>>> http://groups.google.com/group/sqlalchemy?hl=en. 
> >>>> 
> >>>> 
> >>> 
> >>> -- 
> >>> You received this message because you are subscribed to the Google 
> >>> Groups "sqlalchemy" group. 
> >>> To post to this group, send email to 
> >>> sqlal...@googlegroups.com<javascript:>. 
>
> >>> To unsubscribe from this group, send email to 
> >>> sqlalchemy+...@googlegroups.com <javascript:>. 
> >>> For more options, visit this group at 
> >>> http://groups.google.com/group/sqlalchemy?hl=en. 
> >>> 
> >>> 
> >> 
> >> -- 
> >> You received this message because you are subscribed to the Google 
> >> Groups "sqlalchemy" group. 
> >> To post to this group, send email to 
> >> sqlal...@googlegroups.com<javascript:>. 
>
> >> To unsubscribe from this group, send email to 
> >> sqlalchemy+...@googlegroups.com <javascript:>. 
> >> For more options, visit this group at 
> >> http://groups.google.com/group/sqlalchemy?hl=en. 
> >> 
> >> 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups "sqlalchemy" group. 
> > To post to this group, send email to 
> > sqlal...@googlegroups.com<javascript:>. 
>
> > To unsubscribe from this group, send email to 
> sqlalchemy+...@googlegroups.com <javascript:>. 
> > For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en. 
> > 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/xQHoRrrfOugJ.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Full table scan using Oracle String indexes

Reply via email to