Thanks for the feedback.  This is a very interesting and helpful set of
answers!

I'm merging and deduping several large data sets.  I have a set of business
rules to identify if a new incoming row matches an existing row before we
decide if it really is a new row or not.  Each business rule is another
check against the existing data set - which is too large to just pull back
and examine in memory.  Whenever we update our business rules, or discover
new data to merge, I need to go back and run through another de-duping, so
this is a regularly occurring task.

I actually do run thousands of queries and updates (with flush()) in a
single transaction at a time to minimize commit overhead.  executemany
doesn't really work as a model for me because I need to check each row
against the pending data as well as the already committed update/inserts
and I need to branch my actions based on whether and which the business
rule matched or didn't match.

So every few ms I can shave off of the read queries (most of the action is
at the database), the better.  It is possible that the overhead of managing
prepared statement parameters exceeds the overhead saved from having to
reprocess the plan every time the queries run.  I was hoping it would be
easy to try it and see.  I'm now very interested in some of the discussions
on profiling the code and may pursue some of that.

I really like implementing my business rules in SQLAlchemy instead of
psycopg.  It makes the code much cleaner and more readable.

I'm already also tuning the database, adding functional indexes, and that
sort of thing.

Thanks Claudio - I'll mull over the pros and cons of explicitly managing
the connections to prepare the statements vs just being patient while the
job runs vs gains I might achieve elsewhere.






On Thu, Feb 6, 2014 at 1:31 PM, Claudio Freire <klaussfre...@gmail.com>wrote:

> On Thu, Feb 6, 2014 at 3:08 PM, Michael Bayer <mike...@zzzcomputing.com>
> wrote:
> >> I was imagining/hoping I'd find something like this:
> >>
> >>    # prepare the query:
> >>    myPreparedQuery = mySession.query(stuff).filter(parameter
> definitions).prepare()
> >>
> >>    # run the query whenever I need it during my session:
> >>    myPreparedQuery.parameters(stuff).fetchall()
> >
> > prepared statements don't really apply much to SELECT statements, the
> performance gains from such are marginal as you typically invoke a
> particular SELECT just once within a transaction, and prepared statements
> don't necessarily carry across transaction or connection boundaries on
> backends.
>
> That's simply false. Because of the following:
>
> > There are various performance concerns related to SELECT queries and
> solving them depends much on identifying where a particular performance
> issue resides.   There's overhead on the side of Python/SQLAlchemy
> formulating the string statement, theres overhead on the side of passing it
> to the database to be parsed (which is by far the most infinitesimal part)
>
> That part is most definitely not always so infinitesimally small. I
> recall one case (granted, one among so many) of a complex query that
> was very fast to execute. The query had dozens of joins, so it took
> considerable planning time on the database side. Planning took around
> 300ms, where execution took only 25ms. So, having spotted the
> bottleneck, I switched to using prepared statements (it was a query
> that was run several times in the session, not millions of times, but
> several) with different arguments. The benefit was considerable.
>
> To do that (I was using SQLA 0.5) I had to compile the select object
> and generate the query string, I used Text in the bindparams to
> replace them with "$1", "$2", etc... (as postgres likes it), and the
> built a "PREPARE" statement and an "EXECUTE" one, it was rather
> simple, having SQLA generate the query string.
>
> The result was a 10-fold increase in performance.
>
> DBAPI doesn't have direct support for prepared statements, but hacking
> them isn't that difficult. You just have to make sure you send the
> PREPARE, EXECUTE and the DEALLOCATE (never forget the DEALLOCATE) on
> the same connection. Which is easy, just ask SQLAlchemy's session for
> its connection and use it.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/QMAmqtQomA8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to