Re: [sqlalchemy] Full table scan using Oracle String indexes

David Moore Wed, 01 Aug 2012 23:49:37 -0700

Hi Michael,

Indeed, you are correct - adding the message column to the cx_oracle query 
shows the same behaviour as the sqlalchemy query.  Sorry I missed that.


I will take these results to the cx_oracle list on this basis.  On my second 
question, though, is there a way to enforce convert_unicode='force' only on 
Oracle databases across an application?  I know the hooking syntax changed from 
sqlalchemy 0.6 to 0.7, so I'd imagine if there is a way, it would be different 
for those two versions.

thanks,

-- 
David Moore
Senior Software Engineer
St. James Software
Email: dav...@sjsoft.com


----- Original Message -----
> Hi David -
> 
> I've done some experiments with the script you gave me.  Correct me
> if I'm wrong (which is very possible), but the attached revised
> version appears to demonstrate the difference between the cx_oracle
> and SQLAlchemy versions is really just that the SQLAlchemy version
> is running a different SQL string:
> 
>       SELECT test_table.colid, test_table.message
>       FROM test_table
>       WHERE test_table.colid = :colid_1
> 
> whereas the cx_oracle version, critically, does not ask for the
> "message" column:
> 
>       SELECT test_table.colid FROM test_table WHERE test_table.colid =
>       :colid
> 
> it's not clear to me if the placement of the "message" column impacts
> the Oracle planner, or otherwise if the latency is on the cx_oracle
> side regarding buffering of columns or something similar.     But i
> am able to get the SQLAlchemy version to be as fast as the cx_oracle
> version, and the cx_oracle version to be as slow as the sqlalchemy
> version, by just adding/removing the "message" column from the
> query.
> 
> The attached script also tries to eliminate the overhead of
> SQLAlchemy compiling the SQL construct within the timed portion, as
> I was testing this against only 40K rows and wanted to try to get
> the results as close as possible.  It's only dealing with the
> unicode parameter, and it's true if you encode the parameter first,
> both the SQLA and cx_oracle versions get faster, even to the point
> that the two-column version minus unicode is faster than the one
> column version with unicode.   My guess is that cx_oracle is
> detecting a unicode value in the input and doing something different
> with the statement overall as a result of it being present.
> 
> Some typical set of results are:
> 
> Setting up table
> sa_both_cols 1.20209717751
> sa_one_col 0.0182020664215
> sa_literal_sql 0.019690990448
> cx_oracle_one_col 0.0839619636536
> cx_oracle_both_cols 1.18438816071
> 
> Setting up table
> sa_both_cols 1.17753505707
> sa_one_col 0.0785720348358
> sa_literal_sql 0.016618013382
> cx_oracle_one_col 0.0194280147552
> cx_oracle_both_cols 1.15302705765
> 
> while the cx_oracle/SQLA versions seem to compete for the "fast"
> version, which is because we're only running the statement once and
> I'm running on a low performing amazon small instance, it's clear
> the "two column" version is where the vast amount of latency occurs
> for both systems.
> 
> 
> let me know if you can confirm similar results on your end.  I tested
> only with SQLAlchemy 0.8 in trunk but I was able to reproduce your
> initial results so I assume the experience with 0.7, 0.6 would be
> similar.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Aug 1, 2012, at 6:40 AM, David Moore wrote:
> 
> > Hi all,
> > 
> > This is a bug in a product with multiple versions in support.  The
> > issue can be reproduced with
> > sqlalchemy: 0.6.1, 0.6.6, 0.7.8
> > cx_Oracle: 5.0.4, 5.1, 5.1.2
> > Oracle: 10g XE and 10g Enterprise
> > 
> > We have tables created with sqlalchemy.String primary keys and
> > other indexes.  When querying through sqlalchemy, when we pass in
> > a unicode value for the key, Oracle does not use the index, and
> > instead uses a full table scan.  When executing the exact same
> > query through cx_Oracle, we don't see this issue.  The script
> > attached is a small test case which demonstrates the issue.
> >  Sample output:
> > 
> >  Running select
> >  Unicode params took 3.455 seconds
> >  Encoded params took 0 seconds
> >  Running cx_Oracle select
> >  Unicode params took 0.0619998 seconds
> >  Encoded params took 0 seconds
> > 
> > And running the cx_Oracle select first to prevent pollution by
> > caching:
> > 
> >  Running cx_Oracle select
> >  Unicode params took 0.0469999 seconds
> >  Encoded params took 0 seconds
> >  Running select
> >  Unicode params took 3.267 seconds
> >  Encoded params took 0.0159998 seconds
> > 
> > As can be seen, there's a slight slowdown with the unicode
> > parameter on cx_Oracle, but nothing compared to the slowdown on
> > sqlalchemy.  The sqlalchemy unicode parameter select is the only
> > one which performs a full table scan, so that makes sense.
> > 
> > This issue has been mentioned on this list before - 15 Feb with
> > title "Full Table scan with Oracle due to Charset conversion" and
> > 4 April with title "Problem with Oracle requests" - but in those
> > cases, the issue was reproducible with the cx_Oracle driver as
> > well, which it is not here.
> > 
> > Using the convert_unicode='force' argument to the String types does
> > solve the issue, but there are a few reasons I don't want to do
> > that.  This product also supports PostgreSQL, SQL Server and
> > MySQL, and I do not want to make changes which will affect running
> > under those databases.  This is also happening in an otherwise
> > very stable product under long-term support in a number of places,
> > so whatever change is made needs to be minimal.
> > 
> > Adding an additional function-based index on SYS_OP_C2C(column) for
> > each of the affected columns also solves this issue with a slight
> > performance change for the extra indexes which need to be kept up
> > to date.  This is the stopgap solution we've gone for in the
> > meantime for those systems on production.  It's not ideal, though,
> > and a bit difficult to automate and then verify the results.  We'd
> > prefer to know why this was happening, and maybe fix a sqlalchemy
> > issue in the process.
> > 
> > So, my questions are:
> >  * What is sqlalchemy doing differently to straight cx_Oracle that
> >  triggers this behaviour?  How can I debug this further to find
> >  out?
> >  * Is there a way, on sqlalchemy 0.6.x, to set up a hook which will
> >  use convert_unicode='force' only on an Oracle database and only
> >  for String columns, automatically across an application?  There's
> >  no create_engine argument for convert_unicode='force', which is a
> >  pity.
> >  * And a hook for sqlalchemy 0.7.x to achieve the same result?
> > 
> > Thanks in advance for any help,
> > regards,
> > --
> > David Moore
> > Senior Software Engineer
> > St. James Software
> > Email: dav...@sjsoft.com
> > 
> > --
> > You received this message because you are subscribed to the Google
> > Groups "sqlalchemy" group.
> > To post to this group, send email to sqlalchemy@googlegroups.com.
> > To unsubscribe from this group, send email to
> > sqlalchemy+unsubscr...@googlegroups.com.
> > For more options, visit this group at
> > http://groups.google.com/group/sqlalchemy?hl=en.
> > 
> > <oracle_sqlalchemy_encoding_error.py>
> 
> 
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.
> 
> 

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Full table scan using Oracle String indexes

Reply via email to