Re: [sqlalchemy] Full table scan using Oracle String indexes

Yann Gravrand Thu, 25 Oct 2012 08:41:54 -0700

Le jeudi 25 octobre 2012 17:28:37 UTC+2, Michael Bayer a écrit :
>
> the "convert_unicode=Force" flag does this.  documentation at 
> http://docs.sqlalchemy.org/en/rel_0_7/core/types.html?highlight=unicode#sqlalchemy.types.String.__init__.
>


Hi Michael,
Did you read my original message ? It was about the unability to use this 
flag at engine configuration level.

 

>
>
>
>
> On Oct 25, 2012, at 10:48 AM, David Moore wrote:
>
> Hi Yann,
>
> Sorry for the confusion - my email was musing about what I thought 
> sqlalchemy should do, not what I thought you should do.
>
> The way we have solved this is using the cx_Oracle inputtypehandler hook.  
> So we've overriden where the connection is created by sqlalchemy, and then 
> used the following snippet:
>
> def OracleInputTypeHandler(cursor, value, numElements):
>     if isinstance(value, unicode):
>         return cursor.var(str, arraysize = numElements,
>                 inconverter = lambda x: 
> x.encode(cursor.connection.nencoding))
>
> connection.inputtypehandler = OracleInputTypeHandler
>
> This converts all unicode bind parameters passed into cx_Oracle into the 
> client encoding.
>
> HTH,
> Dave Moore
>
> ------------------------------
>
> Le jeudi 25 octobre 2012 16:21:49 UTC+2, David Moore a écrit :
>>
>> Hello,
>>
>
> Hi, 
>
>>
>> In line with what is required for Python 3, would it not make sense to 
>> insist across the board that bind values to sqlalchemy.String should be 
>> bytestrings and that bind values to sqlalchemy.Unicode should be unicode 
>> strings, converting if necessary?
>>
>
> I am using the Unicode class and not String directly. But the code I was 
> referring to is in the String class.
>  
>
>>   I don't think I understand why you would not want that ever.
>>
>
> I am using unicode strings, but I'd like sqlalchemy to convert them to the 
> dialect encoding in order to avoid a problem with oracle indexes not being 
> used when querying with unicode strings.
>  
>
>>
>> regards,
>> Dave Moore
>>
>> ------------------------------
>>
>> Hi,
>>
>> Having had the same problem as above, I would like to use the 
>> "convert_unicode='force'" flag at engine configuration level.
>> It seems that String tests the value of "convert_unicode" at its own 
>> level as well as engine level, but it isn't the case for the 'force' value 
>> check :
>>
>>     def bind_processor(self, dialect):
>>         if self.convert_unicode or dialect.convert_unicode:
>>             if dialect.supports_unicode_binds and \
>>                 *self.convert_unicode != 'force'*:
>>                 ...
>>
>>     def result_processor(self, dialect, coltype):
>>         wants_unicode = self.convert_unicode or dialect.convert_unicode
>>         needs_convert = wants_unicode and \
>>                         (dialect.returns_unicode_strings is not True or
>>                         *self.convert_unicode == 'force'*)
>>
>> Thus I have to create a type decorator for all my Strings even if I have 
>> set the right flag at engine configuration level...
>>
>> Yann
>>
>>
>> Le jeudi 2 août 2012 23:22:22 UTC+2, Michael Bayer a écrit :
>>>
>>> so the "convert_unicode='force'" flag is not ideal here as that will 
>>> spend most of it's time checking for decodes necessary on the result set 
>>> side, which is a lot of wasted effort.  You can customize how strings are 
>>> handled on the bind side, including per-dialect behavior, using a custom 
>>> type: 
>>>
>>> class MyStringType(TypeDecorator): 
>>>     impl = String 
>>>
>>>     def process_bind_param(self, value, dialect): 
>>>         if value is not None and dialect.name == "oracle": 
>>>             value = value.encode('utf-8') 
>>>
>>> then replace usage of the String(length=XYZ) type with 
>>> MyStringType(length=XYZ). 
>>>
>>> docs:  
>>> http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#types-custom 
>>>
>>>
>>> On Aug 2, 2012, at 5:41 AM, David Moore wrote: 
>>>
>>> > Hi, 
>>> > 
>>> > Further searching seems to indicate this is an Oracle issue, not a 
>>> cx_Oracle issue. 
>>> > http://www.digipedia.pl/usenet/thread/15912/2814/ - same problem with 
>>> cx_Oracle 
>>> > http://support.unify.com/supportforum/viewtopic.php?f=40&t=3823 - 
>>> happening with perl drivers. 
>>> > http://tao.qshine.com/note/note_ora.htm - happening with ADO.NET 
>>> drivers. 
>>> > 
>>> > There's also evidence this problem occurs with MS SQL Server as well: 
>>> > 
>>> http://www.microsoftfaqs.com/Articles/1148439/Index_not_used_when_UNICODE_%3D_true
>>>  - SQL Server 2005 
>>> > 
>>> http://blog.brianhartsock.com/2008/12/14/nvarchar-vs-varchar-in-sql-server-beware/
>>>  
>>> > 
>>> > Although, it seems in SQL Server at least, it still uses the index, 
>>> just in a slower manner, which is probably why we haven't noticed this 
>>> issue yet. 
>>> > 
>>> > It seems to me that the place to fix this is when the type of the 
>>> column is known, which is when sqlalchemy binds the parameter.  There's 
>>> some discussion in one of those links that, at the sql statement 
>>> preparation level, you don't know what the column datatype is, so you can't 
>>> fix it there. 
>>> > 
>>> > Would it be possible for String datatypes to detect unicode values and 
>>> encode them with the database charset? 
>>> > 
>>> > As a lower-level temporary solution, you can get cx_Oracle to encode 
>>> all unicode parameters to strings, but that's obviously the wrong thing to 
>>> do if you have any Unicode columns.  This snippet: 
>>> > 
>>> > def InputTypeHandler(cursor, value, numElements): 
>>> >    if isinstance(value, unicode): 
>>> >        return cursor.var(str, arraysize = numElements, 
>>> >                inconverter = lambda x: 
>>> x.encode(cursor.connection.nencoding)) 
>>> > 
>>> > And then when creating the connection: 
>>> > 
>>> > connection.inputtypehandler = InputTypeHandler 
>>> > 
>>> > thanks for the help, 
>>> > 
>>> > -- 
>>> > David Moore 
>>> > Senior Software Engineer 
>>> > St. James Software 
>>> > Email: dav...@sjsoft.com 
>>> > 
>>> > 
>>> > ----- Original Message ----- 
>>> >> And for anyone else experiencing this issue, there was a subtle 
>>> >> difference in the execution plans that's now apparent.  The 
>>> >> statement which only selects colid runs a FAST FULL SCAN and 
>>> >> successfully converts the unicode parameter using SYS_OP_C2C, and 
>>> >> uses the index. When you select both colid and message, it runs a 
>>> >> FULL SCAN, and even though it seems to detect the conversion can be 
>>> >> done, it does not use the index at all.  Perhaps this is actually an 
>>> >> Oracle issue? 
>>> >> 
>>> >> -- 
>>> >> David Moore 
>>> >> Senior Software Engineer 
>>> >> St. James Software 
>>> >> Email: dav...@sjsoft.com 
>>> >> 
>>> >> 
>>> >> 
>>> >> ----- Original Message ----- 
>>> >>> Hi Michael, 
>>> >>> 
>>> >>> Indeed, you are correct - adding the message column to the 
>>> >>> cx_oracle 
>>> >>> query shows the same behaviour as the sqlalchemy query.  Sorry I 
>>> >>> missed that. 
>>> >>> 
>>> >>> I will take these results to the cx_oracle list on this basis.  On 
>>> >>> my 
>>> >>> second question, though, is there a way to enforce 
>>> >>> convert_unicode='force' only on Oracle databases across an 
>>> >>> application?  I know the hooking syntax changed from sqlalchemy 0.6 
>>> >>> to 0.7, so I'd imagine if there is a way, it would be different for 
>>> >>> those two versions. 
>>> >>> 
>>> >>> thanks, 
>>> >>> 
>>> >>> -- 
>>> >>> David Moore 
>>> >>> Senior Software Engineer 
>>> >>> St. James Software 
>>> >>> Email: dav...@sjsoft.com 
>>> >>> 
>>> >>> 
>>> >>> ----- Original Message ----- 
>>> >>>> Hi David - 
>>> >>>> 
>>> >>>> I've done some experiments with the script you gave me.  Correct 
>>> >>>> me 
>>> >>>> if I'm wrong (which is very possible), but the attached revised 
>>> >>>> version appears to demonstrate the difference between the 
>>> >>>> cx_oracle 
>>> >>>> and SQLAlchemy versions is really just that the SQLAlchemy 
>>> >>>> version 
>>> >>>> is running a different SQL string: 
>>> >>>> 
>>> >>>>         SELECT test_table.colid, test_table.message 
>>> >>>>         FROM test_table 
>>> >>>>         WHERE test_table.colid = :colid_1 
>>> >>>> 
>>> >>>> whereas the cx_oracle version, critically, does not ask for the 
>>> >>>> "message" column: 
>>> >>>> 
>>> >>>>         SELECT test_table.colid FROM test_table WHERE 
>>> test_table.colid = 
>>> >>>>         :colid 
>>> >>>> 
>>> >>>> it's not clear to me if the placement of the "message" column 
>>> >>>> impacts 
>>> >>>> the Oracle planner, or otherwise if the latency is on the 
>>> >>>> cx_oracle 
>>> >>>> side regarding buffering of columns or something similar.     But 
>>> >>>> i 
>>> >>>> am able to get the SQLAlchemy version to be as fast as the 
>>> >>>> cx_oracle 
>>> >>>> version, and the cx_oracle version to be as slow as the 
>>> >>>> sqlalchemy 
>>> >>>> version, by just adding/removing the "message" column from the 
>>> >>>> query. 
>>> >>>> 
>>> >>>> The attached script also tries to eliminate the overhead of 
>>> >>>> SQLAlchemy compiling the SQL construct within the timed portion, 
>>> >>>> as 
>>> >>>> I was testing this against only 40K rows and wanted to try to get 
>>> >>>> the results as close as possible.  It's only dealing with the 
>>> >>>> unicode parameter, and it's true if you encode the parameter 
>>> >>>> first, 
>>> >>>> both the SQLA and cx_oracle versions get faster, even to the 
>>> >>>> point 
>>> >>>> that the two-column version minus unicode is faster than the one 
>>> >>>> column version with unicode.   My guess is that cx_oracle is 
>>> >>>> detecting a unicode value in the input and doing something 
>>> >>>> different 
>>> >>>> with the statement overall as a result of it being present. 
>>> >>>> 
>>> >>>> Some typical set of results are: 
>>> >>>> 
>>> >>>> Setting up table 
>>> >>>> sa_both_cols 1.20209717751 
>>> >>>> sa_one_col 0.0182020664215 
>>> >>>> sa_literal_sql 0.019690990448 
>>> >>>> cx_oracle_one_col 0.0839619636536 
>>> >>>> cx_oracle_both_cols 1.18438816071 
>>> >>>> 
>>> >>>> Setting up table 
>>> >>>> sa_both_cols 1.17753505707 
>>> >>>> sa_one_col 0.0785720348358 
>>> >>>> sa_literal_sql 0.016618013382 
>>> >>>> cx_oracle_one_col 0.0194280147552 
>>> >>>> cx_oracle_both_cols 1.15302705765 
>>> >>>> 
>>> >>>> while the cx_oracle/SQLA versions seem to compete for the "fast" 
>>> >>>> version, which is because we're only running the statement once 
>>> >>>> and 
>>> >>>> I'm running on a low performing amazon small instance, it's clear 
>>> >>>> the "two column" version is where the vast amount of latency 
>>> >>>> occurs 
>>> >>>> for both systems. 
>>> >>>> 
>>> >>>> 
>>> >>>> let me know if you can confirm similar results on your end.  I 
>>> >>>> tested 
>>> >>>> only with SQLAlchemy 0.8 in trunk but I was able to reproduce 
>>> >>>> your 
>>> >>>> initial results so I assume the experience with 0.7, 0.6 would be 
>>> >>>> similar. 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> On Aug 1, 2012, at 6:40 AM, David Moore wrote: 
>>> >>>> 
>>> >>>>> Hi all, 
>>> >>>>> 
>>> >>>>> This is a bug in a product with multiple versions in support. 
>>> >>>>> The 
>>> >>>>> issue can be reproduced with 
>>> >>>>> sqlalchemy: 0.6.1, 0.6.6, 0.7.8 
>>> >>>>> cx_Oracle: 5.0.4, 5.1, 5.1.2 
>>> >>>>> Oracle: 10g XE and 10g Enterprise 
>>> >>>>> 
>>> >>>>> We have tables created with sqlalchemy.String primary keys and 
>>> >>>>> other indexes.  When querying through sqlalchemy, when we pass 
>>> >>>>> in 
>>> >>>>> a unicode value for the key, Oracle does not use the index, and 
>>> >>>>> instead uses a full table scan.  When executing the exact same 
>>> >>>>> query through cx_Oracle, we don't see this issue.  The script 
>>> >>>>> attached is a small test case which demonstrates the issue. 
>>> >>>>> Sample output: 
>>> >>>>> 
>>> >>>>> Running select 
>>> >>>>> Unicode params took 3.455 seconds 
>>> >>>>> Encoded params took 0 seconds 
>>> >>>>> Running cx_Oracle select 
>>> >>>>> Unicode params took 0.0619998 seconds 
>>> >>>>> Encoded params took 0 seconds 
>>> >>>>> 
>>> >>>>> And running the cx_Oracle select first to prevent pollution by 
>>> >>>>> caching: 
>>> >>>>> 
>>> >>>>> Running cx_Oracle select 
>>> >>>>> Unicode params took 0.0469999 seconds 
>>> >>>>> Encoded params took 0 seconds 
>>> >>>>> Running select 
>>> >>>>> Unicode params took 3.267 seconds 
>>> >>>>> Encoded params took 0.0159998 seconds 
>>> >>>>> 
>>> >>>>> As can be seen, there's a slight slowdown with the unicode 
>>> >>>>> parameter on cx_Oracle, but nothing compared to the slowdown on 
>>> >>>>> sqlalchemy.  The sqlalchemy unicode parameter select is the 
>>> >>>>> only 
>>> >>>>> one which performs a full table scan, so that makes sense. 
>>> >>>>> 
>>> >>>>> This issue has been mentioned on this list before - 15 Feb with 
>>> >>>>> title "Full Table scan with Oracle due to Charset conversion" 
>>> >>>>> and 
>>> >>>>> 4 April with title "Problem with Oracle requests" - but in 
>>> >>>>> those 
>>> >>>>> cases, the issue was reproducible with the cx_Oracle driver as 
>>> >>>>> well, which it is not here. 
>>> >>>>> 
>>> >>>>> Using the convert_unicode='force' argument to the String types 
>>> >>>>> does 
>>> >>>>> solve the issue, but there are a few reasons I don't want to do 
>>> >>>>> that.  This product also supports PostgreSQL, SQL Server and 
>>> >>>>> MySQL, and I do not want to make changes which will affect 
>>> >>>>> running 
>>> >>>>> under those databases.  This is also happening in an otherwise 
>>> >>>>> very stable product under long-term support in a number of 
>>> >>>>> places, 
>>> >>>>> so whatever change is made needs to be minimal. 
>>> >>>>> 
>>> >>>>> Adding an additional function-based index on SYS_OP_C2C(column) 
>>> >>>>> for 
>>> >>>>> each of the affected columns also solves this issue with a 
>>> >>>>> slight 
>>> >>>>> performance change for the extra indexes which need to be kept 
>>> >>>>> up 
>>> >>>>> to date.  This is the stopgap solution we've gone for in the 
>>> >>>>> meantime for those systems on production.  It's not ideal, 
>>> >>>>> though, 
>>> >>>>> and a bit difficult to automate and then verify the results. 
>>> >>>>> We'd 
>>> >>>>> prefer to know why this was happening, and maybe fix a 
>>> >>>>> sqlalchemy 
>>> >>>>> issue in the process. 
>>> >>>>> 
>>> >>>>> So, my questions are: 
>>> >>>>> * What is sqlalchemy doing differently to straight cx_Oracle 
>>> >>>>> that 
>>> >>>>> triggers this behaviour?  How can I debug this further to find 
>>> >>>>> out? 
>>> >>>>> * Is there a way, on sqlalchemy 0.6.x, to set up a hook which 
>>> >>>>> will 
>>> >>>>> use convert_unicode='force' only on an Oracle database and 
>>> >>>>> only 
>>> >>>>> for String columns, automatically across an application? 
>>> >>>>>  There's 
>>> >>>>> no create_engine argument for convert_unicode='force', which 
>>> >>>>> is 
>>> >>>>> a 
>>> >>>>> pity. 
>>> >>>>> * And a hook for sqlalchemy 0.7.x to achieve the same result? 
>>> >>>>> 
>>> >>>>> Thanks in advance for any help, 
>>> >>>>> regards, 
>>> >>>>> -- 
>>> >>>>> David Moore 
>>> >>>>> Senior Software Engineer 
>>> >>>>> St. James Software 
>>> >>>>> Email: dav...@sjsoft.com 
>>> >>>>> 
>>> >>>>> -- 
>>> >>>>> You received this message because you are subscribed to the 
>>> >>>>> Google 
>>> >>>>> Groups "sqlalchemy" group. 
>>> >>>>> To post to this group, send email to 
>>> >>>>> sqlal...@googlegroups.com. 
>>> >>>>> To unsubscribe from this group, send email to 
>>> >>>>> sqlalchemy+...@googlegroups.com. 
>>> >>>>> For more options, visit this group at 
>>> >>>>> http://groups.google.com/group/sqlalchemy?hl=en. 
>>> >>>>> 
>>> >>>>> <oracle_sqlalchemy_encoding_error.py> 
>>> >>>> 
>>> >>>> 
>>> >>>> -- 
>>> >>>> You received this message because you are subscribed to the 
>>> >>>> Google 
>>> >>>> Groups "sqlalchemy" group. 
>>> >>>> To post to this group, send email to sqlal...@googlegroups.com. 
>>> >>>> To unsubscribe from this group, send email to 
>>> >>>> sqlalchemy+...@googlegroups.com. 
>>> >>>> For more options, visit this group at 
>>> >>>> http://groups.google.com/group/sqlalchemy?hl=en. 
>>> >>>> 
>>> >>>> 
>>> >>> 
>>> >>> -- 
>>> >>> You received this message because you are subscribed to the Google 
>>> >>> Groups "sqlalchemy" group. 
>>> >>> To post to this group, send email to sqlal...@googlegroups.com. 
>>> >>> To unsubscribe from this group, send email to 
>>> >>> sqlalchemy+...@googlegroups.com. 
>>> >>> For more options, visit this group at 
>>> >>> http://groups.google.com/group/sqlalchemy?hl=en. 
>>> >>> 
>>> >>> 
>>> >> 
>>> >> -- 
>>> >> You received this message because you are subscribed to the Google 
>>> >> Groups "sqlalchemy" group. 
>>> >> To post to this group, send email to sqlal...@googlegroups.com. 
>>> >> To unsubscribe from this group, send email to 
>>> >> sqlalchemy+...@googlegroups.com. 
>>> >> For more options, visit this group at 
>>> >> http://groups.google.com/group/sqlalchemy?hl=en. 
>>> >> 
>>> >> 
>>> > 
>>> > -- 
>>> > You received this message because you are subscribed to the Google 
>>> Groups "sqlalchemy" group. 
>>> > To post to this group, send email to sqlal...@googlegroups.com. 
>>> > To unsubscribe from this group, send email to 
>>> sqlalchemy+...@googlegroups.com. 
>>> > For more options, visit this group at 
>>> http://groups.google.com/group/sqlalchemy?hl=en. 
>>> > 
>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "sqlalchemy" group.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msg/sqlalchemy/-/xQHoRrrfOugJ.
>> To post to this group, send email to sqlal...@googlegroups.com.
>> To unsubscribe from this group, send email to 
>> sqlalchemy+...@googlegroups.com.
>> For more options, visit this group at 
>> http://groups.google.com/group/sqlalchemy?hl=en.
>>
>>
>>
>>
>> -- 
>> David Moore
>> Senior Software Engineer
>> St. James Software
>> Email: dav...@sjsoft.com
>>
>>
>
>
> -- 
> David Moore
> Senior Software Engineer
> St. James Software
> Email: dav...@sjsoft.com <javascript:>
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com<javascript:>
> .
> To unsubscribe from this group, send email to 
> sqlalchemy+...@googlegroups.com <javascript:>.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/bAGLNKapvSMJ.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Full table scan using Oracle String indexes

Reply via email to