On Feb 6, 2014, at 2:58 PM, Claudio Freire <klaussfre...@gmail.com> wrote:

> On Thu, Feb 6, 2014 at 4:47 PM, Michael Bayer <mike...@zzzcomputing.com> 
> wrote:
>> On Feb 6, 2014, at 2:43 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
>> 
>>> On Thu, Feb 6, 2014 at 4:38 PM, Michael Bayer <mike...@zzzcomputing.com> 
>>> wrote:
>>>> On Feb 6, 2014, at 1:31 PM, Claudio Freire <klaussfre...@gmail.com> wrote:
>>>> 
>>>>> 
>>>>>> There are various performance concerns related to SELECT queries and 
>>>>>> solving them depends much on identifying where a particular performance 
>>>>>> issue resides.   There's overhead on the side of Python/SQLAlchemy 
>>>>>> formulating the string statement, theres overhead on the side of passing 
>>>>>> it to the database to be parsed (which is by far the most infinitesimal 
>>>>>> part)
>>>>> 
>>>>> That part is most definitely not always so infinitesimally small. I
>>>>> recall one case (granted, one among so many) of a complex query that
>>>>> was very fast to execute. The query had dozens of joins, so it took
>>>>> considerable planning time on the database side. Planning took around
>>>>> 300ms, where execution took only 25ms. So, having spotted the
>>>>> bottleneck, I switched to using prepared statements (it was a query
>>>>> that was run several times in the session, not millions of times, but
>>>>> several) with different arguments. The benefit was considerable.
>>>> 
>>>> ideally the database would know how to cache plans based on the string 
>>>> form of the statement. Oracle does this.  It appears that Postrgresql does 
>>>> not unless you specifically use a prepared statement.
>>> 
>>> Well, it cannot. At least not simplistically, because postgres uses
>>> the literals in the statement (the ones likely to change)
>>> to
>>> specialize plan cost, and different values can produce different plans
>>> (depending on stats). For instance, a query with a where clause
>>> "deleted=false" that selects 99% of a table would do a sequential
>>> scan, whereas "deleted=true" might use an index (if there was such an
>>> index).
>>> 
>>> They thus don't cache plans unless specifically asked for (with
>>> prepare), because those plans may very well be inferior (they don't
>>> use value-specific knowledge).
>> 
>> right, but from an API POV, its more tedious since we have to track the 
>> handle on the outside.
>> 
>> This could be implemented as a cursor event even in conjunction with a 
>> custom execution option and possibly using compiled_cache as well.
> 
> That would be tricky. You'd have to maintain a tally of which
> connection prepared which query (think the complications of
> multi-engine setups), and make sure you deallocate the handle before
> returning the connection to the pool. Not to mention coming up with
> unique handles.

you can use a per-connection cache using connection.info, and you can also 
manage the lifecycle of the connection and the cache of items with connection 
and/or pool events as well (e.g. if connection.rollback() clears out prepared 
statements).   lots of functions already do things like this, like keeping 
track of XA handles and such.  it wouldn’t really be hard at all actually, as 
long as the DBAPI itself behaves nicely when lots of handles are allocated.   

the execution_options functionality would then be usable as a way to enable the 
feature on a per-connection or per-statement basis, like:

        stmt = 
select([some_table]).execution_options(pg_prepared_statement=True)
        conn.execute(stmt)

the string form of stmt could be linked to the generated handle so that any 
number of stmt object’s whose string form matches the target would work.





Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to