On 21 Nov 2008, at 03:32, Michael Bayer wrote:
> > > On Nov 20, 2008, at 8:47 PM, Michael Bayer wrote: > >>> The first scenario is a single table with 24,000 rows. The problem >>> is >>> that using SQLAlchemy through Elixir to map this table to an object, >>> and performing a fairly naive MappedThing.query().all() the process >>> takes roughly 4.8 seconds to return every item! > > you know, I read this as 4.8 seconds for the full result set. 4.8 > seconds per item means something is extremely wrong. SQLA can fetch > 20,000 rows generally in under two seconds, so please post some test > code so we can see where you're going wrong. I had a feeling the bulk nature of the query would confuse the matter so i have performed some more tests with a slightly more restricted query with similarly poor results. I have not posted much on mail lists much so if it is inappropriate to post code i apologise. Here is my code: ---------------------------------------------------------------------------------------- import time from elixir import * from sqlalchemy import * metadata.bind = 'mysql://some:[EMAIL PROTECTED]/SomeTable' metadata.echo = True users_groups = Table('usergroup', metadata, autoload=True) class Group(Entity): using_options(tablename='groups', autoload=True) users = ManyToMany('User',tablename="usergroup") class User(Entity): using_options(tablename='users', autoload=True) using_mapper_options(column_prefix="py_") groups = ManyToMany('Group',tablename="usergroup",lazy=False) class Video(Entity): using_options(tablename='videos', autoload=True) #subject = ManyToMany('Subject',tablename="subjectvideo",lazy=False) class Subject(Entity): using_options(tablename='subjects', autoload=True) #videos = ManyToMany('Video',tablename="subjectvideo",lazy=False) class Field(Entity): using_options(tablename='fields', autoload=True) class Value(Entity): using_options(tablename='values', autoload=True) class Annotation(Entity): using_options(tablename='annotations', autoload=True) user = ManyToOne('User',lazy=False) subject = ManyToOne('Subject',lazy=False) field = ManyToOne('Field',lazy=False) value = ManyToOne('Value',lazy=False) setup_all() allAnn = Annotation.query().filter(User.py_user == "msn").filter(Subject.id == 2 ) print allAnn import MySQLdb import MySQLdb.cursors db = MySQLdb.connect( passwd="login", user="some", db="someTable", host="someServer", cursorclass=MySQLdb.cursors.DictCursor ) c = db.cursor() t1 = time.time() sql = """ SELECT fields.name,values.name,users.user,subjects.leg_id FROM annotations LEFT JOIN `fields` ON annotations.field = fields.id LEFT JOIN `values` ON annotations.value = `values`.id LEFT JOIN `users` ON annotations.user = `users`.id LEFT JOIN `subjects` ON annotations.subject = `subjects`.id WHERE users.user = "msn" AND subjects.id = 2 """ c.execute(sql) results = c.fetchall() holder = [] for res in results: holder.append(res) t2 = time.time() print (t2-t1) sql = """ SELECT annotations.id AS annotations_id, annotations.user AS annotations_user, annotations.subject AS annotations_subject, annotations.field AS annotations_field, annotations.value AS annotations_value, users_1.id AS users_1_id, users_1.user AS users_1_user, users_1.pass AS users_1_pass, users_1.lab_id AS users_1_lab_id, users_1.email AS users_1_email, groups_1.id AS groups_1_id, groups_1.name AS groups_1_name, subjects_1.id AS subjects_1_id, subjects_1.leg_id AS subjects_1_leg_id, subjects_1.`displayOrder` AS `subjects_1_displayOrder`, fields_1.id AS fields_1_id, fields_1.name AS fields_1_name, fields_1.help AS fields_1_help, values_1.id AS values_1_id, values_1.`parentId` AS `values_1_parentId`, values_1.field AS values_1_field, values_1.name AS values_1_name, values_1.image AS values_1_image, values_1.`order` AS values_1_order FROM users, subjects, annotations LEFT OUTER JOIN users AS users_1 ON users_1.id = annotations.user LEFT OUTER JOIN usergroup AS usergroup_1 ON users_1.id = usergroup_1.`userId` LEFT OUTER JOIN groups AS groups_1 ON groups_1.id = usergroup_1.`groupId` LEFT OUTER JOIN subjects AS subjects_1 ON subjects_1.id = annotations.subject LEFT OUTER JOIN `fields` AS fields_1 ON fields_1.id = annotations.field LEFT OUTER JOIN `values` AS values_1 ON values_1.id = annotations.value WHERE users.user = '%s' AND subjects.id = %s ORDER BY annotations.id, users_1.id, usergroup_1.`userId`, subjects_1.id, fields_1.id, values_1.id """%("msn",2) t1 = time.time() c.execute(sql) results = c.fetchall() holder = [] for res in results: holder.append(res) t2 = time.time() print (t2-t1) t2 = time.time() print allAnn # This prints: """ """ t1 = time.time() print len(allAnn.all()) t2 = time.time() print (t2-t1) ---------------------------------------------------------------------------------------- Right! So what this code does is map a few classes to existing tables in a database. The content of the tables should be made obvious by the huge query SQLAlchemy generates. The result of this query is 23 rows from a 24,000 row table. I show a query i constructed by hand (Get every annotation of user "msn" on subject "2") as i would run using MysqlDb and cursor. I time this by measuring the time taken between executing the query and putting each result in a tuple. The running time for this query is 0.05s This is compared to running Annotation.query().filter(User.user=="msn").filter(Subject.id=2). The query generated by SQL alchemy can be seen in the second declaration of the sql string. For purposes of interest i run this generated query (with fields filled in manually) through the same MysqlDb cursor. This takes 16 seconds! A query which returns 23 rows should NOT take 16 seconds to return. Finally, this is compared to running Annotation.query().filter(User.user=="msn").filter(Subject.id=2).all() which returns a bunch of lovely SQLAlchemy mapped objects. However this process takes approximately 26 seconds! It should be noted that the process outlined in the code above runs on the same 24,000 row dataset as the previous test i mentioned (Code for which can be found here: http://www.sinjax.net/wordpress/?p=1652 ... it was down over night my apologies). But the difference is literally the difference between commenting out the ManyToOne definitions in the Annotation class and removing the filter commands. What i test here is one of the final queries im currently running in my system quite regularly so i need it to be relatively fast. And now that i put it in such a bare bones version as this, i see I'm CLEARLY doing something wrong that results in SQLAlchemy generating this query which takes far too long to run, but i'm not sure what i have to do to make SQLAlchemy generate something that looks more like the first query i run which i wrote by hand, and yet gets all the information i need. FYI i have also tested the query i've written by hand returning ALL the fields the SQLAlchemy query requests and also running the ORDER BY at the end....this takes 0.3seconds. I have no idea why, but the slow down seems to come from the join section of the generated query? But i can't quite see why, im investigating that... just thought i'd post more information while i did Cheers ---------- Sina Samangooei PhD Student University of Southampton United Kingdom [EMAIL PROTECTED] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To post to this group, send email to sqlalchemy@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en -~----------~----~----~----~------~----~------~--~---