[web2py] Re: DAL speed - an idea

2012-03-09 Thread spiffytech
Before we add special features to the DAL, has anyone profiled the DAL with 
queries and data similar to nick name's to see if there's any low-hanging 
fruit we can tackle to speed up the normal select()?

nick name, can you please provide your table schema so we can test 
performance on data sets similar to yours?


On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote:

 One of my controllers need to go through a lot of records to provide a 
 meaningful answer -- as in, 60k records.

 Just loading them from the database takes about 100ms 
 (db.executesql(select * from table order by id;)); Doing the same through 
 DAL takes over 6 seconds. I realize that the DAL does do a lot of 
 additional work, which in general is helpful -- but I can do without all 
 the parsing / Rows() generation for this.

 What do people here think about adding a db.rawselect(...), which is a 
 slim rapper for db.executesql(db._select())  that wraps everything with 
 a named tuple? It solves most of the speed problem when it is needed, but 
 still maintains a lot of the features of the SQL DAL processing.



[web2py] Re: DAL speed - an idea

2012-03-09 Thread Anthony
mcm made some initial attempts and indicated about a 25% improvement, with 
the expectation that some further improvement could be made with more 
effort: 
https://groups.google.com/forum/#!topic/web2py-developers/Tp6HXsSf7lk/discussion.
 
However, it sounded like we wouldn't be likely to get enough improvement to 
substantially help with large selects (i.e., tens of thousands of rows) -- 
for that case, we probably need something like what nick name is proposing 
(which will actually be easier to implement, anyway).

Anthony

On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote:

 Before we add special features to the DAL, has anyone profiled the DAL 
 with queries and data similar to nick name's to see if there's any 
 low-hanging fruit we can tackle to speed up the normal select()?

 nick name, can you please provide your table schema so we can test 
 performance on data sets similar to yours?


 On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote:

 One of my controllers need to go through a lot of records to provide a 
 meaningful answer -- as in, 60k records.

 Just loading them from the database takes about 100ms 
 (db.executesql(select * from table order by id;)); Doing the same through 
 DAL takes over 6 seconds. I realize that the DAL does do a lot of 
 additional work, which in general is helpful -- but I can do without all 
 the parsing / Rows() generation for this.

 What do people here think about adding a db.rawselect(...), which is a 
 slim rapper for db.executesql(db._select())  that wraps everything with 
 a named tuple? It solves most of the speed problem when it is needed, but 
 still maintains a lot of the features of the SQL DAL processing.


On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote:

 Before we add special features to the DAL, has anyone profiled the DAL 
 with queries and data similar to nick name's to see if there's any 
 low-hanging fruit we can tackle to speed up the normal select()?

 nick name, can you please provide your table schema so we can test 
 performance on data sets similar to yours?


 On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote:

 One of my controllers need to go through a lot of records to provide a 
 meaningful answer -- as in, 60k records.

 Just loading them from the database takes about 100ms 
 (db.executesql(select * from table order by id;)); Doing the same through 
 DAL takes over 6 seconds. I realize that the DAL does do a lot of 
 additional work, which in general is helpful -- but I can do without all 
 the parsing / Rows() generation for this.

 What do people here think about adding a db.rawselect(...), which is a 
 slim rapper for db.executesql(db._select())  that wraps everything with 
 a named tuple? It solves most of the speed problem when it is needed, but 
 still maintains a lot of the features of the SQL DAL processing.



Re: [web2py] Re: DAL speed - an idea

2012-03-09 Thread Vinicius Assef
Anthony, I'm not supposed to retrieve tens os thousands records from a
single query in a web app.

I would address this problem to app structure, instead of to DAL.
Anyway, we can execute raw SQL statements via DAL, right?

--
Vinicius Assef



On Fri, Mar 9, 2012 at 4:12 PM, Anthony abasta...@gmail.com wrote:
 mcm made some initial attempts and indicated about a 25% improvement, with
 the expectation that some further improvement could be made with more
 effort: https://groups.google.com/forum/#!topic/web2py-developers/Tp6HXsSf7lk/discussion.
 However, it sounded like we wouldn't be likely to get enough improvement to
 substantially help with large selects (i.e., tens of thousands of rows) --
 for that case, we probably need something like what nick name is proposing
 (which will actually be easier to implement, anyway).

 Anthony


 On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote:

 Before we add special features to the DAL, has anyone profiled the DAL
 with queries and data similar to nick name's to see if there's any
 low-hanging fruit we can tackle to speed up the normal select()?

 nick name, can you please provide your table schema so we can test
 performance on data sets similar to yours?


 On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote:

 One of my controllers need to go through a lot of records to provide a
 meaningful answer -- as in, 60k records.

 Just loading them from the database takes about 100ms
 (db.executesql(select * from table order by id;)); Doing the same through
 DAL takes over 6 seconds. I realize that the DAL does do a lot of additional
 work, which in general is helpful -- but I can do without all the parsing /
 Rows() generation for this.

 What do people here think about adding a db.rawselect(...), which is a
 slim rapper for db.executesql(db._select())  that wraps everything with
 a named tuple? It solves most of the speed problem when it is needed, but
 still maintains a lot of the features of the SQL DAL processing.


 On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote:

 Before we add special features to the DAL, has anyone profiled the DAL
 with queries and data similar to nick name's to see if there's any
 low-hanging fruit we can tackle to speed up the normal select()?

 nick name, can you please provide your table schema so we can test
 performance on data sets similar to yours?


 On Thursday, February 9, 2012 1:51:47 PM UTC-5, nick name wrote:

 One of my controllers need to go through a lot of records to provide a
 meaningful answer -- as in, 60k records.

 Just loading them from the database takes about 100ms
 (db.executesql(select * from table order by id;)); Doing the same through
 DAL takes over 6 seconds. I realize that the DAL does do a lot of additional
 work, which in general is helpful -- but I can do without all the parsing /
 Rows() generation for this.

 What do people here think about adding a db.rawselect(...), which is a
 slim rapper for db.executesql(db._select())  that wraps everything with
 a named tuple? It solves most of the speed problem when it is needed, but
 still maintains a lot of the features of the SQL DAL processing.




Re: [web2py] Re: DAL speed - an idea

2012-03-09 Thread Anthony


 Anthony, I'm not supposed to retrieve tens os thousands records from a
 single query in a web app.


Mostly true, but not 
always: https://groups.google.com/d/msg/web2py/A11z4UdMaIc/882_shAVk4UJ

Of course, if you've got a busy site with lots of queries, it can add up 
even for relatively smaller result sets.
 

 Anyway, we can execute raw SQL statements via DAL, right?

Yes, it's possible to create your own manual workarounds -- this would just 
be an added convenience. At least a basic version of this shouldn't be hard 
to implement -- just return the raw db response without any further 
processing.

Anthony



[web2py] Re: DAL speed - an idea

2012-03-09 Thread nick name

On Friday, March 9, 2012 1:48:06 PM UTC-5, spiffytech wrote:

 Before we add special features to the DAL, has anyone profiled the DAL 
 with queries and data similar to nick name's to see if there's any 
 low-hanging fruit we can tackle to speed up the normal select()?


The suggested change is very small (~5 lines), backward compatible, and 
useful for other stuff as well (e.g. saving memory and speeding up 
Rows.as_list()  as long as only one table is involved, which is a very 
common use case).

nick name, can you please provide your table schema so we can test 
 performance on data sets similar to yours?


No time now, but I'll try to set up a test case for future evaluations over 
the weekend. 


[web2py] Re: DAL speed - an idea

2012-03-08 Thread nick name


On Friday, February 10, 2012 12:04:59 AM UTC-5, Massimo Di Pierro wrote:

 open a ticket, this can be done. I like the idea of passing a 
 processor. 


Opened in http://code.google.com/p/web2py/issues/detail?id=701  with 
discussion and a much improved suggestion of how to handle this.


[web2py] Re: DAL speed - an idea

2012-03-08 Thread Anthony
Nice. I added a comment.

On Friday, March 9, 2012 12:41:26 AM UTC-5, nick name wrote:



 On Friday, February 10, 2012 12:04:59 AM UTC-5, Massimo Di Pierro wrote:

 open a ticket, this can be done. I like the idea of passing a 
 processor. 


 Opened in http://code.google.com/p/web2py/issues/detail?id=701  with 
 discussion and a much improved suggestion of how to handle this.



[web2py] Re: DAL speed - an idea

2012-02-10 Thread Simon Lukell
+1
Having this option would make it really simple to change between the
full-blown DAL result set and a faster stripped down one (which could
then be adapted with the processor to keep the rest of the code
working.)

 I've been thinking about something like this as well. Instead of a separate
 select_raw() method, maybe we can just add a raw=True|False argument to the
 existing select() method.


[web2py] Re: DAL speed - an idea

2012-02-09 Thread Massimo Di Pierro
open a ticket, this can be done. I like the idea of passing a
processor.

On Feb 9, 3:14 pm, Anthony abasta...@gmail.com wrote:
 I've been thinking about something like this as well. Instead of a separate
 select_raw() method, maybe we can just add a raw=True|False argument to the
 existing select() method. I like the namedtuple idea as well (I think some
 adapters already provide that as an option -- e.g., psycopg2).

 Anthony







 On Thursday, February 9, 2012 3:04:41 PM UTC-5, nick name wrote:

  Yes, that is the basis of what I am suggesting.

  There is not currently such a thing; there is something called
  'select_raw' implemented in the GoogleDataStore adapter, but not in
  anything else, and it isn't exactly what I am proposing.

  To elaborate:

  Assume the table is defined as follows:

      reftable = db.define_table('reftable', Field('a', string))
      table = db.define_table('table', Field('b', reftable))

  In my case, I need to pull all the records (60,000) from the database to
  compute some aggregation which I cannot compute using sql. There are two
  alternatives here:

      r1 = db().select(table.ALL) # takes  6 seconds

      r2 = db.executesql(db._select(table.ALL)) # takes ~0.1sec

  The records returned in the first instance are much richer; they have
  record chasing (e.g. I can do r1[0].b.a to select through the foreign key),
  they have methods like r1[0].update_record() and r1[0].delete_record(), and
  other nice stuff.

  However, for this use, I don't need the additional records, and I do need
  the speed, so I would rather use r2. However, r2 is not a direct
  replacement -- it doesn't have the column names. If I use

      r3 = db.executesql(db._select(table.ALL), as_dict=True) # still takes
  ~0.1sec

  I can do r3[0]['b'] but I cannot do r3[0].b; and it takes a lot more
  memory than r2.

  A suggestion: add another parameter, processor=... which, if available,
  will be called with the db.connection.cursor, returning a function, through
  which each routine will be passed; example

  def named_tuple_process(name, description):
     from collections import namedtuple
     fields = ' '.join([x[0] for x in description])
     return namedtuple(name, fields)

      r4 = db.executesql(db._select(table.ALL), process=lambda x:
  named_tuple_process('tablerec', x))

  r4[0].b # will now work; not a full replacement, but good enough for many
  uses.

  In fact, you can do that externally -

  r4 = db.executesql(db._select(table.ALL))
  f = named_tuple_process('tablerec', db._adapter.cursor.description)
  r4 = [f(x) for x in r4]

  But this requires reaching into the internals of the db adapter.

  Finally, I propose to define x.raw_select(*args) to do:
  db.executesql(x._select(*args))

  which would make this a relatively clean replacement.