so the "convert_unicode='force'" flag is not ideal here as that will spend most of it's time checking for decodes necessary on the result set side, which is a lot of wasted effort. You can customize how strings are handled on the bind side, including per-dialect behavior, using a custom type:
class MyStringType(TypeDecorator): impl = String def process_bind_param(self, value, dialect): if value is not None and dialect.name == "oracle": value = value.encode('utf-8') then replace usage of the String(length=XYZ) type with MyStringType(length=XYZ). docs: http://docs.sqlalchemy.org/en/rel_0_7/core/types.html#types-custom On Aug 2, 2012, at 5:41 AM, David Moore wrote: > Hi, > > Further searching seems to indicate this is an Oracle issue, not a cx_Oracle > issue. > http://www.digipedia.pl/usenet/thread/15912/2814/ - same problem with > cx_Oracle > http://support.unify.com/supportforum/viewtopic.php?f=40&t=3823 - happening > with perl drivers. > http://tao.qshine.com/note/note_ora.htm - happening with ADO.NET drivers. > > There's also evidence this problem occurs with MS SQL Server as well: > http://www.microsoftfaqs.com/Articles/1148439/Index_not_used_when_UNICODE_%3D_true > - SQL Server 2005 > http://blog.brianhartsock.com/2008/12/14/nvarchar-vs-varchar-in-sql-server-beware/ > > Although, it seems in SQL Server at least, it still uses the index, just in a > slower manner, which is probably why we haven't noticed this issue yet. > > It seems to me that the place to fix this is when the type of the column is > known, which is when sqlalchemy binds the parameter. There's some discussion > in one of those links that, at the sql statement preparation level, you don't > know what the column datatype is, so you can't fix it there. > > Would it be possible for String datatypes to detect unicode values and encode > them with the database charset? > > As a lower-level temporary solution, you can get cx_Oracle to encode all > unicode parameters to strings, but that's obviously the wrong thing to do if > you have any Unicode columns. This snippet: > > def InputTypeHandler(cursor, value, numElements): > if isinstance(value, unicode): > return cursor.var(str, arraysize = numElements, > inconverter = lambda x: x.encode(cursor.connection.nencoding)) > > And then when creating the connection: > > connection.inputtypehandler = InputTypeHandler > > thanks for the help, > > -- > David Moore > Senior Software Engineer > St. James Software > Email: dav...@sjsoft.com > > > ----- Original Message ----- >> And for anyone else experiencing this issue, there was a subtle >> difference in the execution plans that's now apparent. The >> statement which only selects colid runs a FAST FULL SCAN and >> successfully converts the unicode parameter using SYS_OP_C2C, and >> uses the index. When you select both colid and message, it runs a >> FULL SCAN, and even though it seems to detect the conversion can be >> done, it does not use the index at all. Perhaps this is actually an >> Oracle issue? >> >> -- >> David Moore >> Senior Software Engineer >> St. James Software >> Email: dav...@sjsoft.com >> >> >> >> ----- Original Message ----- >>> Hi Michael, >>> >>> Indeed, you are correct - adding the message column to the >>> cx_oracle >>> query shows the same behaviour as the sqlalchemy query. Sorry I >>> missed that. >>> >>> I will take these results to the cx_oracle list on this basis. On >>> my >>> second question, though, is there a way to enforce >>> convert_unicode='force' only on Oracle databases across an >>> application? I know the hooking syntax changed from sqlalchemy 0.6 >>> to 0.7, so I'd imagine if there is a way, it would be different for >>> those two versions. >>> >>> thanks, >>> >>> -- >>> David Moore >>> Senior Software Engineer >>> St. James Software >>> Email: dav...@sjsoft.com >>> >>> >>> ----- Original Message ----- >>>> Hi David - >>>> >>>> I've done some experiments with the script you gave me. Correct >>>> me >>>> if I'm wrong (which is very possible), but the attached revised >>>> version appears to demonstrate the difference between the >>>> cx_oracle >>>> and SQLAlchemy versions is really just that the SQLAlchemy >>>> version >>>> is running a different SQL string: >>>> >>>> SELECT test_table.colid, test_table.message >>>> FROM test_table >>>> WHERE test_table.colid = :colid_1 >>>> >>>> whereas the cx_oracle version, critically, does not ask for the >>>> "message" column: >>>> >>>> SELECT test_table.colid FROM test_table WHERE test_table.colid = >>>> :colid >>>> >>>> it's not clear to me if the placement of the "message" column >>>> impacts >>>> the Oracle planner, or otherwise if the latency is on the >>>> cx_oracle >>>> side regarding buffering of columns or something similar. But >>>> i >>>> am able to get the SQLAlchemy version to be as fast as the >>>> cx_oracle >>>> version, and the cx_oracle version to be as slow as the >>>> sqlalchemy >>>> version, by just adding/removing the "message" column from the >>>> query. >>>> >>>> The attached script also tries to eliminate the overhead of >>>> SQLAlchemy compiling the SQL construct within the timed portion, >>>> as >>>> I was testing this against only 40K rows and wanted to try to get >>>> the results as close as possible. It's only dealing with the >>>> unicode parameter, and it's true if you encode the parameter >>>> first, >>>> both the SQLA and cx_oracle versions get faster, even to the >>>> point >>>> that the two-column version minus unicode is faster than the one >>>> column version with unicode. My guess is that cx_oracle is >>>> detecting a unicode value in the input and doing something >>>> different >>>> with the statement overall as a result of it being present. >>>> >>>> Some typical set of results are: >>>> >>>> Setting up table >>>> sa_both_cols 1.20209717751 >>>> sa_one_col 0.0182020664215 >>>> sa_literal_sql 0.019690990448 >>>> cx_oracle_one_col 0.0839619636536 >>>> cx_oracle_both_cols 1.18438816071 >>>> >>>> Setting up table >>>> sa_both_cols 1.17753505707 >>>> sa_one_col 0.0785720348358 >>>> sa_literal_sql 0.016618013382 >>>> cx_oracle_one_col 0.0194280147552 >>>> cx_oracle_both_cols 1.15302705765 >>>> >>>> while the cx_oracle/SQLA versions seem to compete for the "fast" >>>> version, which is because we're only running the statement once >>>> and >>>> I'm running on a low performing amazon small instance, it's clear >>>> the "two column" version is where the vast amount of latency >>>> occurs >>>> for both systems. >>>> >>>> >>>> let me know if you can confirm similar results on your end. I >>>> tested >>>> only with SQLAlchemy 0.8 in trunk but I was able to reproduce >>>> your >>>> initial results so I assume the experience with 0.7, 0.6 would be >>>> similar. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Aug 1, 2012, at 6:40 AM, David Moore wrote: >>>> >>>>> Hi all, >>>>> >>>>> This is a bug in a product with multiple versions in support. >>>>> The >>>>> issue can be reproduced with >>>>> sqlalchemy: 0.6.1, 0.6.6, 0.7.8 >>>>> cx_Oracle: 5.0.4, 5.1, 5.1.2 >>>>> Oracle: 10g XE and 10g Enterprise >>>>> >>>>> We have tables created with sqlalchemy.String primary keys and >>>>> other indexes. When querying through sqlalchemy, when we pass >>>>> in >>>>> a unicode value for the key, Oracle does not use the index, and >>>>> instead uses a full table scan. When executing the exact same >>>>> query through cx_Oracle, we don't see this issue. The script >>>>> attached is a small test case which demonstrates the issue. >>>>> Sample output: >>>>> >>>>> Running select >>>>> Unicode params took 3.455 seconds >>>>> Encoded params took 0 seconds >>>>> Running cx_Oracle select >>>>> Unicode params took 0.0619998 seconds >>>>> Encoded params took 0 seconds >>>>> >>>>> And running the cx_Oracle select first to prevent pollution by >>>>> caching: >>>>> >>>>> Running cx_Oracle select >>>>> Unicode params took 0.0469999 seconds >>>>> Encoded params took 0 seconds >>>>> Running select >>>>> Unicode params took 3.267 seconds >>>>> Encoded params took 0.0159998 seconds >>>>> >>>>> As can be seen, there's a slight slowdown with the unicode >>>>> parameter on cx_Oracle, but nothing compared to the slowdown on >>>>> sqlalchemy. The sqlalchemy unicode parameter select is the >>>>> only >>>>> one which performs a full table scan, so that makes sense. >>>>> >>>>> This issue has been mentioned on this list before - 15 Feb with >>>>> title "Full Table scan with Oracle due to Charset conversion" >>>>> and >>>>> 4 April with title "Problem with Oracle requests" - but in >>>>> those >>>>> cases, the issue was reproducible with the cx_Oracle driver as >>>>> well, which it is not here. >>>>> >>>>> Using the convert_unicode='force' argument to the String types >>>>> does >>>>> solve the issue, but there are a few reasons I don't want to do >>>>> that. This product also supports PostgreSQL, SQL Server and >>>>> MySQL, and I do not want to make changes which will affect >>>>> running >>>>> under those databases. This is also happening in an otherwise >>>>> very stable product under long-term support in a number of >>>>> places, >>>>> so whatever change is made needs to be minimal. >>>>> >>>>> Adding an additional function-based index on SYS_OP_C2C(column) >>>>> for >>>>> each of the affected columns also solves this issue with a >>>>> slight >>>>> performance change for the extra indexes which need to be kept >>>>> up >>>>> to date. This is the stopgap solution we've gone for in the >>>>> meantime for those systems on production. It's not ideal, >>>>> though, >>>>> and a bit difficult to automate and then verify the results. >>>>> We'd >>>>> prefer to know why this was happening, and maybe fix a >>>>> sqlalchemy >>>>> issue in the process. >>>>> >>>>> So, my questions are: >>>>> * What is sqlalchemy doing differently to straight cx_Oracle >>>>> that >>>>> triggers this behaviour? How can I debug this further to find >>>>> out? >>>>> * Is there a way, on sqlalchemy 0.6.x, to set up a hook which >>>>> will >>>>> use convert_unicode='force' only on an Oracle database and >>>>> only >>>>> for String columns, automatically across an application? >>>>> There's >>>>> no create_engine argument for convert_unicode='force', which >>>>> is >>>>> a >>>>> pity. >>>>> * And a hook for sqlalchemy 0.7.x to achieve the same result? >>>>> >>>>> Thanks in advance for any help, >>>>> regards, >>>>> -- >>>>> David Moore >>>>> Senior Software Engineer >>>>> St. James Software >>>>> Email: dav...@sjsoft.com >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the >>>>> Google >>>>> Groups "sqlalchemy" group. >>>>> To post to this group, send email to >>>>> sqlalchemy@googlegroups.com. >>>>> To unsubscribe from this group, send email to >>>>> sqlalchemy+unsubscr...@googlegroups.com. >>>>> For more options, visit this group at >>>>> http://groups.google.com/group/sqlalchemy?hl=en. >>>>> >>>>> <oracle_sqlalchemy_encoding_error.py> >>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the >>>> Google >>>> Groups "sqlalchemy" group. >>>> To post to this group, send email to sqlalchemy@googlegroups.com. >>>> To unsubscribe from this group, send email to >>>> sqlalchemy+unsubscr...@googlegroups.com. >>>> For more options, visit this group at >>>> http://groups.google.com/group/sqlalchemy?hl=en. >>>> >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "sqlalchemy" group. >>> To post to this group, send email to sqlalchemy@googlegroups.com. >>> To unsubscribe from this group, send email to >>> sqlalchemy+unsubscr...@googlegroups.com. >>> For more options, visit this group at >>> http://groups.google.com/group/sqlalchemy?hl=en. >>> >>> >> >> -- >> You received this message because you are subscribed to the Google >> Groups "sqlalchemy" group. >> To post to this group, send email to sqlalchemy@googlegroups.com. >> To unsubscribe from this group, send email to >> sqlalchemy+unsubscr...@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/sqlalchemy?hl=en. >> >> > > -- > You received this message because you are subscribed to the Google Groups > "sqlalchemy" group. > To post to this group, send email to sqlalchemy@googlegroups.com. > To unsubscribe from this group, send email to > sqlalchemy+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/sqlalchemy?hl=en. > -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com. To unsubscribe from this group, send email to sqlalchemy+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.