[sqlalchemy] Re: ORM mapping with Elixir compared to raw cursor query

Sina Samangooei Fri, 21 Nov 2008 06:07:35 -0800


On 21 Nov 2008, at 03:32, Michael Bayer wrote:

>
>
> On Nov 20, 2008, at 8:47 PM, Michael Bayer wrote:
>
>>> The first scenario is a single table with 24,000 rows. The problem  
>>> is
>>> that using SQLAlchemy through Elixir to map this table to an object,
>>> and performing a fairly naive MappedThing.query().all() the process
>>> takes roughly 4.8 seconds to return every item!
>
> you know, I read this as 4.8 seconds for the full result set.  4.8
> seconds per item means something is extremely wrong.   SQLA can fetch
> 20,000 rows generally in under two seconds, so please post some test
> code so we can see where you're going wrong.

I had a feeling the bulk nature of the query would confuse the matter  
so i have performed some more tests with a slightly more restricted  
query with similarly poor results.

I have not posted much on mail lists much so if it is inappropriate to  
post code i apologise.
Here is my code:

----------------------------------------------------------------------------------------
import time

from elixir import *
from sqlalchemy import *

metadata.bind = 'mysql://some:[EMAIL PROTECTED]/SomeTable'
metadata.echo = True
users_groups = Table('usergroup', metadata, autoload=True)
class Group(Entity):
        using_options(tablename='groups', autoload=True)
        users = ManyToMany('User',tablename="usergroup")

class User(Entity):
        using_options(tablename='users', autoload=True)
        using_mapper_options(column_prefix="py_")
        groups = ManyToMany('Group',tablename="usergroup",lazy=False)

class Video(Entity):
        using_options(tablename='videos', autoload=True)
        #subject = ManyToMany('Subject',tablename="subjectvideo",lazy=False)

class Subject(Entity):
        using_options(tablename='subjects', autoload=True)
        #videos = ManyToMany('Video',tablename="subjectvideo",lazy=False)

class Field(Entity):
        using_options(tablename='fields', autoload=True)

class Value(Entity):
        using_options(tablename='values', autoload=True)

class Annotation(Entity):
        using_options(tablename='annotations', autoload=True)
        user = ManyToOne('User',lazy=False)
        subject = ManyToOne('Subject',lazy=False)
        field = ManyToOne('Field',lazy=False)
        value = ManyToOne('Value',lazy=False)   

setup_all()
allAnn = Annotation.query().filter(User.py_user ==  
"msn").filter(Subject.id == 2 )
print allAnn
import MySQLdb
import MySQLdb.cursors

db = MySQLdb.connect(
        passwd="login",
        user="some",
        db="someTable",
        host="someServer",
        cursorclass=MySQLdb.cursors.DictCursor
)

c = db.cursor()
t1 = time.time()
sql = """ SELECT fields.name,values.name,users.user,subjects.leg_id
FROM annotations
        LEFT JOIN `fields` ON annotations.field = fields.id
        LEFT JOIN `values` ON annotations.value = `values`.id
        LEFT JOIN `users` ON annotations.user = `users`.id
        LEFT JOIN `subjects` ON annotations.subject = `subjects`.id
WHERE users.user = "msn"
AND subjects.id = 2
"""
c.execute(sql)
results = c.fetchall()
holder = []

for res in results:
        holder.append(res)
t2 = time.time()
print (t2-t1)

sql = """
SELECT annotations.id AS annotations_id, annotations.user AS  
annotations_user, annotations.subject AS annotations_subject,  
annotations.field AS annotations_field, annotations.value AS  
annotations_value, users_1.id AS users_1_id, users_1.user AS  
users_1_user, users_1.pass AS users_1_pass, users_1.lab_id AS  
users_1_lab_id, users_1.email AS users_1_email, groups_1.id AS  
groups_1_id, groups_1.name AS groups_1_name, subjects_1.id AS  
subjects_1_id, subjects_1.leg_id AS subjects_1_leg_id,  
subjects_1.`displayOrder` AS `subjects_1_displayOrder`, fields_1.id AS  
fields_1_id, fields_1.name AS fields_1_name, fields_1.help AS  
fields_1_help, values_1.id AS values_1_id, values_1.`parentId` AS  
`values_1_parentId`, values_1.field AS values_1_field, values_1.name  
AS values_1_name, values_1.image AS values_1_image, values_1.`order`  
AS values_1_order
FROM users, subjects, annotations
        LEFT OUTER JOIN users AS users_1 ON users_1.id = annotations.user
        LEFT OUTER JOIN usergroup AS usergroup_1 ON users_1.id =  
usergroup_1.`userId`
        LEFT OUTER JOIN groups AS groups_1 ON groups_1.id =  
usergroup_1.`groupId`
        LEFT OUTER JOIN subjects AS subjects_1 ON subjects_1.id =  
annotations.subject
        LEFT OUTER JOIN `fields` AS fields_1 ON fields_1.id = annotations.field
        LEFT OUTER JOIN `values` AS values_1 ON values_1.id = annotations.value
WHERE users.user = '%s' AND subjects.id = %s
ORDER BY annotations.id, users_1.id, usergroup_1.`userId`,  
subjects_1.id, fields_1.id, values_1.id
"""%("msn",2)
t1 = time.time()
c.execute(sql)
results = c.fetchall()
holder = []

for res in results:
        holder.append(res)
t2 = time.time()
print (t2-t1)
t2 = time.time()
print allAnn
# This prints:
"""

"""
t1 = time.time()
print len(allAnn.all())
t2 = time.time()
print (t2-t1)
----------------------------------------------------------------------------------------

Right! So what this code does is map a few classes to existing tables  
in a database. The content of the tables should be made obvious by the  
huge query SQLAlchemy generates.

The result of this query is 23 rows from a 24,000 row table.

I show a query i constructed by hand (Get every annotation of user  
"msn" on subject "2") as i would run using MysqlDb and cursor. I time  
this by measuring the time taken between executing the query and  
putting each result in a tuple. The running time for this query is 0.05s

This is compared to running  
Annotation.query().filter(User.user=="msn").filter(Subject.id=2). The  
query generated by SQL alchemy can be seen in the second declaration  
of the sql string. For purposes of interest i run this generated query  
(with fields filled in manually) through the same MysqlDb cursor. This  
takes 16 seconds! A query which returns 23 rows should NOT take 16  
seconds to return.

Finally, this is compared to running  
Annotation.query().filter(User.user=="msn").filter(Subject.id=2).all()  
which returns a bunch of lovely SQLAlchemy mapped objects. However  
this process takes approximately 26 seconds!

It should be noted that the process outlined in the code above runs on  
the same 24,000 row dataset as the previous test i mentioned (Code for  
which can be found here: http://www.sinjax.net/wordpress/?p=1652 ...  
it was down over night my apologies). But the difference is literally  
the difference between commenting out the ManyToOne definitions in the  
Annotation class and removing the filter commands.

What i test here is one of the final queries im currently running in  
my system quite regularly so i need it to be relatively fast.  And now  
that i put it in such a bare bones version as this, i see I'm CLEARLY  
doing something wrong that results in SQLAlchemy generating this query  
which takes far too long to run, but i'm not sure what i have to do to  
make SQLAlchemy generate something that looks more like the first  
query i run which i wrote by hand, and yet gets all the information i  
need.

FYI i have also tested the query i've written by hand returning ALL  
the fields the SQLAlchemy query requests and also running the ORDER BY  
at the end....this takes 0.3seconds. I have no idea why, but the slow  
down seems to come from the join section of the generated query?

But i can't quite see why, im investigating that... just thought i'd  
post more information while i did

Cheers
----------
Sina Samangooei
PhD Student
University  of Southampton
United Kingdom
[EMAIL PROTECTED]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en
-~----------~----~----~----~------~----~------~--~---

[sqlalchemy] Re: ORM mapping with Elixir compared to raw cursor query

Reply via email to