[web2py] Re: DAL speed - an idea

Massimo Di Pierro Thu, 09 Feb 2012 21:05:12 -0800

open a ticket, this can be done. I like the idea of passing a
processor.


On Feb 9, 3:14 pm, Anthony <abasta...@gmail.com> wrote:
> I've been thinking about something like this as well. Instead of a separate
> select_raw() method, maybe we can just add a raw=True|False argument to the
> existing select() method. I like the namedtuple idea as well (I think some
> adapters already provide that as an option -- e.g., psycopg2).
>
> Anthony
>
>
>
>
>
>
>
> On Thursday, February 9, 2012 3:04:41 PM UTC-5, nick name wrote:
>
> > Yes, that is the basis of what I am suggesting.
>
> > There is not currently such a thing; there is something called
> > 'select_raw' implemented in the GoogleDataStore adapter, but not in
> > anything else, and it isn't exactly what I am proposing.
>
> > To elaborate:
>
> > Assume the table is defined as follows:
>
> >     reftable = db.define_table('reftable', Field('a', string))
> >     table = db.define_table('table', Field('b', reftable))
>
> > In my case, I need to pull all the records (60,000) from the database to
> > compute some aggregation which I cannot compute using sql. There are two
> > alternatives here:
>
> >     r1 = db().select(table.ALL) # takes > 6 seconds
>
> >     r2 = db.executesql(db._select(table.ALL)) # takes ~0.1sec
>
> > The records returned in the first instance are much richer; they have
> > record chasing (e.g. I can do r1[0].b.a to select through the foreign key),
> > they have methods like r1[0].update_record() and r1[0].delete_record(), and
> > other nice stuff.
>
> > However, for this use, I don't need the additional records, and I do need
> > the speed, so I would rather use r2. However, r2 is not a direct
> > replacement -- it doesn't have the column names. If I use
>
> >     r3 = db.executesql(db._select(table.ALL), as_dict=True) # still takes
> > ~0.1sec
>
> > I can do r3[0]['b'] but I cannot do r3[0].b; and it takes a lot more
> > memory than r2.
>
> > A suggestion: add another parameter, processor=... which, if available,
> > will be called with the db.connection.cursor, returning a function, through
> > which each routine will be passed; example
>
> > def named_tuple_process(name, description):
> >    from collections import namedtuple
> >    fields = ' '.join([x[0] for x in description])
> >    return namedtuple(name, fields)
>
> >     r4 = db.executesql(db._select(table.ALL), process=lambda x:
> > named_tuple_process('tablerec', x))
>
> > r4[0].b # will now work; not a full replacement, but good enough for many
> > uses.
>
> > In fact, you can do that externally -
>
> > r4 = db.executesql(db._select(table.ALL))
> > f = named_tuple_process('tablerec', db._adapter.cursor.description)
> > r4 = [f(x) for x in r4]
>
> > But this requires reaching into the internals of the db adapter.
>
> > Finally, I propose to define x.raw_select(*args) to do:
> > db.executesql(x._select(*args))
>
> > which would make this a relatively clean replacement.

[web2py] Re: DAL speed - an idea

Reply via email to