Re: [sqlalchemy] Full table scan using Oracle String indexes

Yann Gravrand Thu, 25 Oct 2012 07:34:44 -0700

Le jeudi 25 octobre 2012 16:21:49 UTC+2, David Moore a écrit :
>
> Hello,
>


Hi, 

>
> In line with what is required for Python 3, would it not make sense to 
> insist across the board that bind values to sqlalchemy.String should be 
> bytestrings and that bind values to sqlalchemy.Unicode should be unicode 
> strings, converting if necessary?
>

I am using the Unicode class and not String directly. But the code I was 
referring to is in the String class.
 

>   I don't think I understand why you would not want that ever.
>

I am using unicode strings, but I'd like sqlalchemy to convert them to the 
dialect encoding in order to avoid a problem with oracle indexes not being 
used when querying with unicode strings.
 

>
> regards,
> Dave Moore
>
> ------------------------------
>
> Hi,
>
> Having had the same problem as above, I would like to use the 
> "convert_unicode='force'" flag at engine configuration level.
> It seems that String tests the value of "convert_unicode" at its own level 
> as well as engine level, but it isn't the case for the 'force' value check :
>
>     def bind_processor(self, dialect):
>         if self.convert_unicode or dialect.convert_unicode:
>             if dialect.supports_unicode_binds and \
>                 *self.convert_unicode != 'force'*:
>                 ...
>
>     def result_processor(self, dialect, coltype):
>         wants_unicode = self.convert_unicode or dialect.convert_unicode
>         needs_convert = wants_unicode and \
>                         (dialect.returns_unicode_strings is not True or
>                         *self.convert_unicode == 'force'*)
>
> Thus I have to create a type decorator for all my Strings even if I have 
> set the right flag at engine configuration level...
>
> Yann
>
>
> Le jeudi 2 août 2012 23:22:22 UTC+2, Michael Bayer a écrit :
>>
>> so the "convert_unicode='force'" flag is not ideal here as that will 
>> spend most of it's time checking for decodes necessary on the result set 
>> side, which is a lot of wasted effort.  You can customize how strings are 
>> handled on the bind side, including per-dialect behavior, using a custom 
>> type: 
>>
>> class MyStringType(TypeDecorator): 
>>     impl = String 
>>
>>     def process_bind_param(self, value, dialect): 
>>         if value is not None and dialect.name == "oracle": 
>>             value = value.encode('utf-8') 
>>
>> then replace usage of the String(length=XYZ) type with 
>> MyStringType(length=XYZ). 
>>
>> docs:  http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#types-custom 
>>
>>
>> On Aug 2, 2012, at 5:41 AM, David Moore wrote: 
>>
>> > Hi, 
>> > 
>> > Further searching seems to indicate this is an Oracle issue, not a 
>> cx_Oracle issue. 
>> > http://www.digipedia.pl/usenet/thread/15912/2814/ - same problem with 
>> cx_Oracle 
>> > http://support.unify.com/supportforum/viewtopic.php?f=40&t=3823 - 
>> happening with perl drivers. 
>> > http://tao.qshine.com/note/note_ora.htm - happening with ADO.NETdrivers. 
>> > 
>> > There's also evidence this problem occurs with MS SQL Server as well: 
>> > 
>> http://www.microsoftfaqs.com/Articles/1148439/Index_not_used_when_UNICODE_%3D_true-
>>  SQL Server 2005 
>> > 
>> http://blog.brianhartsock.com/2008/12/14/nvarchar-vs-varchar-in-sql-server-beware/
>>  
>> > 
>> > Although, it seems in SQL Server at least, it still uses the index, 
>> just in a slower manner, which is probably why we haven't noticed this 
>> issue yet. 
>> > 
>> > It seems to me that the place to fix this is when the type of the 
>> column is known, which is when sqlalchemy binds the parameter.  There's 
>> some discussion in one of those links that, at the sql statement 
>> preparation level, you don't know what the column datatype is, so you can't 
>> fix it there. 
>> > 
>> > Would it be possible for String datatypes to detect unicode values and 
>> encode them with the database charset? 
>> > 
>> > As a lower-level temporary solution, you can get cx_Oracle to encode 
>> all unicode parameters to strings, but that's obviously the wrong thing to 
>> do if you have any Unicode columns.  This snippet: 
>> > 
>> > def InputTypeHandler(cursor, value, numElements): 
>> >    if isinstance(value, unicode): 
>> >        return cursor.var(str, arraysize = numElements, 
>> >                inconverter = lambda x: 
>> x.encode(cursor.connection.nencoding)) 
>> > 
>> > And then when creating the connection: 
>> > 
>> > connection.inputtypehandler = InputTypeHandler 
>> > 
>> > thanks for the help, 
>> > 
>> > -- 
>> > David Moore 
>> > Senior Software Engineer 
>> > St. James Software 
>> > Email: dav...@sjsoft.com 
>> > 
>> > 
>> > ----- Original Message ----- 
>> >> And for anyone else experiencing this issue, there was a subtle 
>> >> difference in the execution plans that's now apparent.  The 
>> >> statement which only selects colid runs a FAST FULL SCAN and 
>> >> successfully converts the unicode parameter using SYS_OP_C2C, and 
>> >> uses the index. When you select both colid and message, it runs a 
>> >> FULL SCAN, and even though it seems to detect the conversion can be 
>> >> done, it does not use the index at all.  Perhaps this is actually an 
>> >> Oracle issue? 
>> >> 
>> >> -- 
>> >> David Moore 
>> >> Senior Software Engineer 
>> >> St. James Software 
>> >> Email: dav...@sjsoft.com 
>> >> 
>> >> 
>> >> 
>> >> ----- Original Message ----- 
>> >>> Hi Michael, 
>> >>> 
>> >>> Indeed, you are correct - adding the message column to the 
>> >>> cx_oracle 
>> >>> query shows the same behaviour as the sqlalchemy query.  Sorry I 
>> >>> missed that. 
>> >>> 
>> >>> I will take these results to the cx_oracle list on this basis.  On 
>> >>> my 
>> >>> second question, though, is there a way to enforce 
>> >>> convert_unicode='force' only on Oracle databases across an 
>> >>> application?  I know the hooking syntax changed from sqlalchemy 0.6 
>> >>> to 0.7, so I'd imagine if there is a way, it would be different for 
>> >>> those two versions. 
>> >>> 
>> >>> thanks, 
>> >>> 
>> >>> -- 
>> >>> David Moore 
>> >>> Senior Software Engineer 
>> >>> St. James Software 
>> >>> Email: dav...@sjsoft.com 
>> >>> 
>> >>> 
>> >>> ----- Original Message ----- 
>> >>>> Hi David - 
>> >>>> 
>> >>>> I've done some experiments with the script you gave me.  Correct 
>> >>>> me 
>> >>>> if I'm wrong (which is very possible), but the attached revised 
>> >>>> version appears to demonstrate the difference between the 
>> >>>> cx_oracle 
>> >>>> and SQLAlchemy versions is really just that the SQLAlchemy 
>> >>>> version 
>> >>>> is running a different SQL string: 
>> >>>> 
>> >>>>         SELECT test_table.colid, test_table.message 
>> >>>>         FROM test_table 
>> >>>>         WHERE test_table.colid = :colid_1 
>> >>>> 
>> >>>> whereas the cx_oracle version, critically, does not ask for the 
>> >>>> "message" column: 
>> >>>> 
>> >>>>         SELECT test_table.colid FROM test_table WHERE 
>> test_table.colid = 
>> >>>>         :colid 
>> >>>> 
>> >>>> it's not clear to me if the placement of the "message" column 
>> >>>> impacts 
>> >>>> the Oracle planner, or otherwise if the latency is on the 
>> >>>> cx_oracle 
>> >>>> side regarding buffering of columns or something similar.     But 
>> >>>> i 
>> >>>> am able to get the SQLAlchemy version to be as fast as the 
>> >>>> cx_oracle 
>> >>>> version, and the cx_oracle version to be as slow as the 
>> >>>> sqlalchemy 
>> >>>> version, by just adding/removing the "message" column from the 
>> >>>> query. 
>> >>>> 
>> >>>> The attached script also tries to eliminate the overhead of 
>> >>>> SQLAlchemy compiling the SQL construct within the timed portion, 
>> >>>> as 
>> >>>> I was testing this against only 40K rows and wanted to try to get 
>> >>>> the results as close as possible.  It's only dealing with the 
>> >>>> unicode parameter, and it's true if you encode the parameter 
>> >>>> first, 
>> >>>> both the SQLA and cx_oracle versions get faster, even to the 
>> >>>> point 
>> >>>> that the two-column version minus unicode is faster than the one 
>> >>>> column version with unicode.   My guess is that cx_oracle is 
>> >>>> detecting a unicode value in the input and doing something 
>> >>>> different 
>> >>>> with the statement overall as a result of it being present. 
>> >>>> 
>> >>>> Some typical set of results are: 
>> >>>> 
>> >>>> Setting up table 
>> >>>> sa_both_cols 1.20209717751 
>> >>>> sa_one_col 0.0182020664215 
>> >>>> sa_literal_sql 0.019690990448 
>> >>>> cx_oracle_one_col 0.0839619636536 
>> >>>> cx_oracle_both_cols 1.18438816071 
>> >>>> 
>> >>>> Setting up table 
>> >>>> sa_both_cols 1.17753505707 
>> >>>> sa_one_col 0.0785720348358 
>> >>>> sa_literal_sql 0.016618013382 
>> >>>> cx_oracle_one_col 0.0194280147552 
>> >>>> cx_oracle_both_cols 1.15302705765 
>> >>>> 
>> >>>> while the cx_oracle/SQLA versions seem to compete for the "fast" 
>> >>>> version, which is because we're only running the statement once 
>> >>>> and 
>> >>>> I'm running on a low performing amazon small instance, it's clear 
>> >>>> the "two column" version is where the vast amount of latency 
>> >>>> occurs 
>> >>>> for both systems. 
>> >>>> 
>> >>>> 
>> >>>> let me know if you can confirm similar results on your end.  I 
>> >>>> tested 
>> >>>> only with SQLAlchemy 0.8 in trunk but I was able to reproduce 
>> >>>> your 
>> >>>> initial results so I assume the experience with 0.7, 0.6 would be 
>> >>>> similar. 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> 
>> >>>> On Aug 1, 2012, at 6:40 AM, David Moore wrote: 
>> >>>> 
>> >>>>> Hi all, 
>> >>>>> 
>> >>>>> This is a bug in a product with multiple versions in support. 
>> >>>>> The 
>> >>>>> issue can be reproduced with 
>> >>>>> sqlalchemy: 0.6.1, 0.6.6, 0.7.8 
>> >>>>> cx_Oracle: 5.0.4, 5.1, 5.1.2 
>> >>>>> Oracle: 10g XE and 10g Enterprise 
>> >>>>> 
>> >>>>> We have tables created with sqlalchemy.String primary keys and 
>> >>>>> other indexes.  When querying through sqlalchemy, when we pass 
>> >>>>> in 
>> >>>>> a unicode value for the key, Oracle does not use the index, and 
>> >>>>> instead uses a full table scan.  When executing the exact same 
>> >>>>> query through cx_Oracle, we don't see this issue.  The script 
>> >>>>> attached is a small test case which demonstrates the issue. 
>> >>>>> Sample output: 
>> >>>>> 
>> >>>>> Running select 
>> >>>>> Unicode params took 3.455 seconds 
>> >>>>> Encoded params took 0 seconds 
>> >>>>> Running cx_Oracle select 
>> >>>>> Unicode params took 0.0619998 seconds 
>> >>>>> Encoded params took 0 seconds 
>> >>>>> 
>> >>>>> And running the cx_Oracle select first to prevent pollution by 
>> >>>>> caching: 
>> >>>>> 
>> >>>>> Running cx_Oracle select 
>> >>>>> Unicode params took 0.0469999 seconds 
>> >>>>> Encoded params took 0 seconds 
>> >>>>> Running select 
>> >>>>> Unicode params took 3.267 seconds 
>> >>>>> Encoded params took 0.0159998 seconds 
>> >>>>> 
>> >>>>> As can be seen, there's a slight slowdown with the unicode 
>> >>>>> parameter on cx_Oracle, but nothing compared to the slowdown on 
>> >>>>> sqlalchemy.  The sqlalchemy unicode parameter select is the 
>> >>>>> only 
>> >>>>> one which performs a full table scan, so that makes sense. 
>> >>>>> 
>> >>>>> This issue has been mentioned on this list before - 15 Feb with 
>> >>>>> title "Full Table scan with Oracle due to Charset conversion" 
>> >>>>> and 
>> >>>>> 4 April with title "Problem with Oracle requests" - but in 
>> >>>>> those 
>> >>>>> cases, the issue was reproducible with the cx_Oracle driver as 
>> >>>>> well, which it is not here. 
>> >>>>> 
>> >>>>> Using the convert_unicode='force' argument to the String types 
>> >>>>> does 
>> >>>>> solve the issue, but there are a few reasons I don't want to do 
>> >>>>> that.  This product also supports PostgreSQL, SQL Server and 
>> >>>>> MySQL, and I do not want to make changes which will affect 
>> >>>>> running 
>> >>>>> under those databases.  This is also happening in an otherwise 
>> >>>>> very stable product under long-term support in a number of 
>> >>>>> places, 
>> >>>>> so whatever change is made needs to be minimal. 
>> >>>>> 
>> >>>>> Adding an additional function-based index on SYS_OP_C2C(column) 
>> >>>>> for 
>> >>>>> each of the affected columns also solves this issue with a 
>> >>>>> slight 
>> >>>>> performance change for the extra indexes which need to be kept 
>> >>>>> up 
>> >>>>> to date.  This is the stopgap solution we've gone for in the 
>> >>>>> meantime for those systems on production.  It's not ideal, 
>> >>>>> though, 
>> >>>>> and a bit difficult to automate and then verify the results. 
>> >>>>> We'd 
>> >>>>> prefer to know why this was happening, and maybe fix a 
>> >>>>> sqlalchemy 
>> >>>>> issue in the process. 
>> >>>>> 
>> >>>>> So, my questions are: 
>> >>>>> * What is sqlalchemy doing differently to straight cx_Oracle 
>> >>>>> that 
>> >>>>> triggers this behaviour?  How can I debug this further to find 
>> >>>>> out? 
>> >>>>> * Is there a way, on sqlalchemy 0.6.x, to set up a hook which 
>> >>>>> will 
>> >>>>> use convert_unicode='force' only on an Oracle database and 
>> >>>>> only 
>> >>>>> for String columns, automatically across an application? 
>> >>>>>  There's 
>> >>>>> no create_engine argument for convert_unicode='force', which 
>> >>>>> is 
>> >>>>> a 
>> >>>>> pity. 
>> >>>>> * And a hook for sqlalchemy 0.7.x to achieve the same result? 
>> >>>>> 
>> >>>>> Thanks in advance for any help, 
>> >>>>> regards, 
>> >>>>> -- 
>> >>>>> David Moore 
>> >>>>> Senior Software Engineer 
>> >>>>> St. James Software 
>> >>>>> Email: dav...@sjsoft.com 
>> >>>>> 
>> >>>>> -- 
>> >>>>> You received this message because you are subscribed to the 
>> >>>>> Google 
>> >>>>> Groups "sqlalchemy" group. 
>> >>>>> To post to this group, send email to 
>> >>>>> sqlal...@googlegroups.com. 
>> >>>>> To unsubscribe from this group, send email to 
>> >>>>> sqlalchemy+...@googlegroups.com. 
>> >>>>> For more options, visit this group at 
>> >>>>> http://groups.google.com/group/sqlalchemy?hl=en. 
>> >>>>> 
>> >>>>> <oracle_sqlalchemy_encoding_error.py> 
>> >>>> 
>> >>>> 
>> >>>> -- 
>> >>>> You received this message because you are subscribed to the 
>> >>>> Google 
>> >>>> Groups "sqlalchemy" group. 
>> >>>> To post to this group, send email to sqlal...@googlegroups.com. 
>> >>>> To unsubscribe from this group, send email to 
>> >>>> sqlalchemy+...@googlegroups.com. 
>> >>>> For more options, visit this group at 
>> >>>> http://groups.google.com/group/sqlalchemy?hl=en. 
>> >>>> 
>> >>>> 
>> >>> 
>> >>> -- 
>> >>> You received this message because you are subscribed to the Google 
>> >>> Groups "sqlalchemy" group. 
>> >>> To post to this group, send email to sqlal...@googlegroups.com. 
>> >>> To unsubscribe from this group, send email to 
>> >>> sqlalchemy+...@googlegroups.com. 
>> >>> For more options, visit this group at 
>> >>> http://groups.google.com/group/sqlalchemy?hl=en. 
>> >>> 
>> >>> 
>> >> 
>> >> -- 
>> >> You received this message because you are subscribed to the Google 
>> >> Groups "sqlalchemy" group. 
>> >> To post to this group, send email to sqlal...@googlegroups.com. 
>> >> To unsubscribe from this group, send email to 
>> >> sqlalchemy+...@googlegroups.com. 
>> >> For more options, visit this group at 
>> >> http://groups.google.com/group/sqlalchemy?hl=en. 
>> >> 
>> >> 
>> > 
>> > -- 
>> > You received this message because you are subscribed to the Google 
>> Groups "sqlalchemy" group. 
>> > To post to this group, send email to sqlal...@googlegroups.com. 
>> > To unsubscribe from this group, send email to 
>> sqlalchemy+...@googlegroups.com. 
>> > For more options, visit this group at 
>> http://groups.google.com/group/sqlalchemy?hl=en. 
>> > 
>>
>>  -- 
> You received this message because you are subscribed to the Google Groups 
> "sqlalchemy" group.
> To view this discussion on the web visit 
> https://groups.google.com/d/msg/sqlalchemy/-/xQHoRrrfOugJ.
> To post to this group, send email to sqlal...@googlegroups.com<javascript:>
> .
> To unsubscribe from this group, send email to 
> sqlalchemy+...@googlegroups.com <javascript:>.
> For more options, visit this group at 
> http://groups.google.com/group/sqlalchemy?hl=en.
>
>
>
>
> -- 
> David Moore
> Senior Software Engineer
> St. James Software
> Email: dav...@sjsoft.com <javascript:>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sqlalchemy/-/ZeC7Vq9TvwQJ.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Full table scan using Oracle String indexes

Reply via email to