[sqlalchemy] Performance Mystery

Warwick Prince Tue, 24 Jul 2012 20:51:10 -0700

Hi All

This is addressed to anyone who may be able to shed some light on this strange 
behaviour;


I'm running MySQL on Windows 2003 Server (Sorry) and have a table that has ~2M 
rows of sales data in it.   Knowing the type of BI queries I would want to be 
doing, I have added indexes where I feel they would help.  I tested the queries 
I wanted to perform manually by directly entering the SQL into the MySQL 
console.  Results were fine. (Tested by putting raw SQL into MySQL Workbench on 
the same server as the database)

I then simply moved the logic into my code, and created very simple queries on 
tables using the table.select() syntax, and adding a few basic group_by and 
'where' additions.   All works perfectly, and I get exactly the same results as 
my manual query BUT...  the manual query returns the results in 3 seconds, and 
the programatic query returns the results in 200+ seconds!!  This is the round 
trip time from .execute() 'til the next line of code being hit. i.e. I have 
fetched nothing yet.

This is what I have done;

I added an audit line in my code to show the final SQL that was being emitted, 
then cut and paste that into MySQL Workbench and executed it.  < 3 seconds was 
the result, with exactly the same actual rows being returned.
The Python code is running on the same server as the MySQL Console.
I have created the same query manually on the Python console, using the 
sqlalchemy constructs like query=table.select() etc and run it there - 180+ 
seconds again - it's I've basically eliminated by code.
I'm using 'mysqlconnector' which I have had no problems in the past, and it's 
not like I'm hammering it with 1000's of queries - just 1 query that will 
return around 12 rows (having processed several 1000 and then GROUPed them).   
I can see the query sitting there in Workbench, taking minutes to complete.
I've tried adding index hints just in case..  Made no difference, as it should 
have been using that index anyway..
When creating the SQLAlchemy test query, (and indeed in my real code) I used 
all defaults for engine, metadata etc with no additional settings at all.
Finally, to add insult to injury, I can execute the identical manual query 
DURING the slow queries execution, and still get it back < 3 seconds.. whilst 
the slow one clunks along and spits out the same result minutes later.
In the SQL below, invoice and customer are VARCHAR columns, whilst date is DATE 
and TotalEx is FLOAT
I have restarted the MySQL Service (several times)
It's *like* the sqlalchemy query is not allowed to use any of the indexes, and 
is trawling through all the data..  Are there permissions or something on using 
an index that I don't know about?  (Both queries are being executed by the same 
user - root)
It is EXACTLY the same query in all ways, and in fact is an extremely simple 
query - as below;

SELECT sum(`sales_data`.`TotalEx`) AS totalSales, date, invoice, customer 
FROM sales_data USE INDEX(idx_date) 
WHERE date >="2011-12-01" AND date <="2012-07-31"
GROUP BY customer, invoice

Any clues?  My project just fell in a hole!

Thanks an advance for any pointers.
Warwick

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

[sqlalchemy] Performance Mystery

Reply via email to