[web2py] Re: DAL speed - an idea
Before we add special features to the DAL, has anyone profiled the DAL with queries and data similar to nick name's to see if there's any low-hanging fruit we can tackle to speed up the normal select()? nick name, can you please provide your table schema so we can test performance on data sets similar to yours? On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote: One of my controllers need to go through a lot of records to provide a meaningful answer -- as in, 60k records. Just loading them from the database takes about 100ms (db.executesql(select * from table order by id;)); Doing the same through DAL takes over 6 seconds. I realize that the DAL does do a lot of additional work, which in general is helpful -- but I can do without all the parsing / Rows() generation for this. What do people here think about adding a db.rawselect(...), which is a slim rapper for db.executesql(db._select()) that wraps everything with a named tuple? It solves most of the speed problem when it is needed, but still maintains a lot of the features of the SQL DAL processing.
[web2py] Re: DAL speed - an idea
mcm made some initial attempts and indicated about a 25% improvement, with the expectation that some further improvement could be made with more effort: https://groups.google.com/forum/#!topic/web2py-developers/Tp6HXsSf7lk/discussion. However, it sounded like we wouldn't be likely to get enough improvement to substantially help with large selects (i.e., tens of thousands of rows) -- for that case, we probably need something like what nick name is proposing (which will actually be easier to implement, anyway). Anthony On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote: Before we add special features to the DAL, has anyone profiled the DAL with queries and data similar to nick name's to see if there's any low-hanging fruit we can tackle to speed up the normal select()? nick name, can you please provide your table schema so we can test performance on data sets similar to yours? On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote: One of my controllers need to go through a lot of records to provide a meaningful answer -- as in, 60k records. Just loading them from the database takes about 100ms (db.executesql(select * from table order by id;)); Doing the same through DAL takes over 6 seconds. I realize that the DAL does do a lot of additional work, which in general is helpful -- but I can do without all the parsing / Rows() generation for this. What do people here think about adding a db.rawselect(...), which is a slim rapper for db.executesql(db._select()) that wraps everything with a named tuple? It solves most of the speed problem when it is needed, but still maintains a lot of the features of the SQL DAL processing. On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote: Before we add special features to the DAL, has anyone profiled the DAL with queries and data similar to nick name's to see if there's any low-hanging fruit we can tackle to speed up the normal select()? nick name, can you please provide your table schema so we can test performance on data sets similar to yours? On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote: One of my controllers need to go through a lot of records to provide a meaningful answer -- as in, 60k records. Just loading them from the database takes about 100ms (db.executesql(select * from table order by id;)); Doing the same through DAL takes over 6 seconds. I realize that the DAL does do a lot of additional work, which in general is helpful -- but I can do without all the parsing / Rows() generation for this. What do people here think about adding a db.rawselect(...), which is a slim rapper for db.executesql(db._select()) that wraps everything with a named tuple? It solves most of the speed problem when it is needed, but still maintains a lot of the features of the SQL DAL processing.
Re: [web2py] Re: DAL speed - an idea
Anthony, I'm not supposed to retrieve tens os thousands records from a single query in a web app. I would address this problem to app structure, instead of to DAL. Anyway, we can execute raw SQL statements via DAL, right? -- Vinicius Assef On Fri, Mar 9, 2012 at 4:12 PM, Anthony abasta...@gmail.com wrote: mcm made some initial attempts and indicated about a 25% improvement, with the expectation that some further improvement could be made with more effort: https://groups.google.com/forum/#!topic/web2py-developers/Tp6HXsSf7lk/discussion. However, it sounded like we wouldn't be likely to get enough improvement to substantially help with large selects (i.e., tens of thousands of rows) -- for that case, we probably need something like what nick name is proposing (which will actually be easier to implement, anyway). Anthony On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote: Before we add special features to the DAL, has anyone profiled the DAL with queries and data similar to nick name's to see if there's any low-hanging fruit we can tackle to speed up the normal select()? nick name, can you please provide your table schema so we can test performance on data sets similar to yours? On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote: One of my controllers need to go through a lot of records to provide a meaningful answer -- as in, 60k records. Just loading them from the database takes about 100ms (db.executesql(select * from table order by id;)); Doing the same through DAL takes over 6 seconds. I realize that the DAL does do a lot of additional work, which in general is helpful -- but I can do without all the parsing / Rows() generation for this. What do people here think about adding a db.rawselect(...), which is a slim rapper for db.executesql(db._select()) that wraps everything with a named tuple? It solves most of the speed problem when it is needed, but still maintains a lot of the features of the SQL DAL processing. On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote: Before we add special features to the DAL, has anyone profiled the DAL with queries and data similar to nick name's to see if there's any low-hanging fruit we can tackle to speed up the normal select()? nick name, can you please provide your table schema so we can test performance on data sets similar to yours? On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote: One of my controllers need to go through a lot of records to provide a meaningful answer -- as in, 60k records. Just loading them from the database takes about 100ms (db.executesql(select * from table order by id;)); Doing the same through DAL takes over 6 seconds. I realize that the DAL does do a lot of additional work, which in general is helpful -- but I can do without all the parsing / Rows() generation for this. What do people here think about adding a db.rawselect(...), which is a slim rapper for db.executesql(db._select()) that wraps everything with a named tuple? It solves most of the speed problem when it is needed, but still maintains a lot of the features of the SQL DAL processing.
Re: [web2py] Re: DAL speed - an idea
Anthony, I'm not supposed to retrieve tens os thousands records from a single query in a web app. Mostly true, but not always: https://groups.google.com/d/msg/web2py/A11z4UdMaIc/882_shAVk4UJ Of course, if you've got a busy site with lots of queries, it can add up even for relatively smaller result sets. Anyway, we can execute raw SQL statements via DAL, right? Yes, it's possible to create your own manual workarounds -- this would just be an added convenience. At least a basic version of this shouldn't be hard to implement -- just return the raw db response without any further processing. Anthony
[web2py] Re: DAL speed - an idea
On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote: Before we add special features to the DAL, has anyone profiled the DAL with queries and data similar to nick name's to see if there's any low-hanging fruit we can tackle to speed up the normal select()? The suggested change is very small (~5 lines), backward compatible, and useful for other stuff as well (e.g. saving memory and speeding up Rows.as_list() as long as only one table is involved, which is a very common use case). nick name, can you please provide your table schema so we can test performance on data sets similar to yours? No time now, but I'll try to set up a test case for future evaluations over the weekend.
[web2py] Re: DAL speed - an idea
On Friday, February 10, 2012 12:04:59 AM UTC-5, Massimo Di Pierro wrote: open a ticket, this can be done. I like the idea of passing a processor. Opened in http://code.google.com/p/web2py/issues/detail?id=701 with discussion and a much improved suggestion of how to handle this.
[web2py] Re: DAL speed - an idea
Nice. I added a comment. On Friday, March 9, 2012 12:41:26 AM UTC-5, nick name wrote: On Friday, February 10, 2012 12:04:59 AM UTC-5, Massimo Di Pierro wrote: open a ticket, this can be done. I like the idea of passing a processor. Opened in http://code.google.com/p/web2py/issues/detail?id=701 with discussion and a much improved suggestion of how to handle this.
[web2py] Re: DAL speed - an idea
+1 Having this option would make it really simple to change between the full-blown DAL result set and a faster stripped down one (which could then be adapted with the processor to keep the rest of the code working.) I've been thinking about something like this as well. Instead of a separate select_raw() method, maybe we can just add a raw=True|False argument to the existing select() method.
[web2py] Re: DAL speed - an idea
open a ticket, this can be done. I like the idea of passing a processor. On Feb 9, 3:14 pm, Anthony abasta...@gmail.com wrote: I've been thinking about something like this as well. Instead of a separate select_raw() method, maybe we can just add a raw=True|False argument to the existing select() method. I like the namedtuple idea as well (I think some adapters already provide that as an option -- e.g., psycopg2). Anthony On Thursday, February 9, 2012 3:04:41 PM UTC-5, nick name wrote: Yes, that is the basis of what I am suggesting. There is not currently such a thing; there is something called 'select_raw' implemented in the GoogleDataStore adapter, but not in anything else, and it isn't exactly what I am proposing. To elaborate: Assume the table is defined as follows: reftable = db.define_table('reftable', Field('a', string)) table = db.define_table('table', Field('b', reftable)) In my case, I need to pull all the records (60,000) from the database to compute some aggregation which I cannot compute using sql. There are two alternatives here: r1 = db().select(table.ALL) # takes 6 seconds r2 = db.executesql(db._select(table.ALL)) # takes ~0.1sec The records returned in the first instance are much richer; they have record chasing (e.g. I can do r1[0].b.a to select through the foreign key), they have methods like r1[0].update_record() and r1[0].delete_record(), and other nice stuff. However, for this use, I don't need the additional records, and I do need the speed, so I would rather use r2. However, r2 is not a direct replacement -- it doesn't have the column names. If I use r3 = db.executesql(db._select(table.ALL), as_dict=True) # still takes ~0.1sec I can do r3[0]['b'] but I cannot do r3[0].b; and it takes a lot more memory than r2. A suggestion: add another parameter, processor=... which, if available, will be called with the db.connection.cursor, returning a function, through which each routine will be passed; example def named_tuple_process(name, description): from collections import namedtuple fields = ' '.join([x[0] for x in description]) return namedtuple(name, fields) r4 = db.executesql(db._select(table.ALL), process=lambda x: named_tuple_process('tablerec', x)) r4[0].b # will now work; not a full replacement, but good enough for many uses. In fact, you can do that externally - r4 = db.executesql(db._select(table.ALL)) f = named_tuple_process('tablerec', db._adapter.cursor.description) r4 = [f(x) for x in r4] But this requires reaching into the internals of the db adapter. Finally, I propose to define x.raw_select(*args) to do: db.executesql(x._select(*args)) which would make this a relatively clean replacement.