Re: [sqlalchemy] Performance problem of simple orm usage

Mike Bayer Thu, 26 Jun 2014 09:25:48 -0700

On 6/26/14, 11:49 AM, Hans-Peter Jansen wrote:
> Dear SQLAchemistas,
>
> I'm suffering from a performance problem from a simple sequence like this:
>
> rec is a sensor record coming "redundantly" from the network, enos is a 
> sensor device, ev is a sensor value record:
>
>     def store_enos_rec(self, rec, srelay):
>         enos = self.lookup_enos(rec.id, srelay)
>         ts = rec.ts.replace(tzinfo = None)
>         ev = self.session.query(EnOSValues).filter(EnOSValues.enos_id == 
> enos.id,
>                                                    EnOSValues.ts == 
> ts).first()
>         if ev:
>             return
>         # create record
>         ev = EnOSValues(enos_id = enos.id, ts = ts, status = True, temp = 
> rec.temp)
>         enos.values.append(ev)
>         try:
>             self.session.commit()
>         except IntegrityError:
>             self.session.rollback()
>
> This is running as a multiprocessing.Process on its own, but the 
> self.session.commit call is the bottleneck here, as shown in the profile 
> callgraph. Is there something simple, that can be done to mitigate the 
> issue, other than doing it straight down the sql way?
>
> Obviously, it is taking a lot of time in history mgmt. A former approach
> to simply dump the record without searching duplicates and catching the
> IntegrityError took even more time due to the session rollback.
>
> Thanks for your patience and any enlightenment,
It's very difficult to read this SVG and a traditional Python profile
output sorted by calltime would be helpful, but what I can see is that
you're calling commit() 20K times, so that is a lot.   commit() is not
intended to be something you call for every individual row of a large
dataset, it is better that you feed a few thousand objects into the
Session at a time before flushing and committing.


I'm not sure what kind of application this is but I would not be mixing
transactional control, that is the commit/rollback, inside of a business
method that only seeks to create some new objects.   There'd be a
containing pattern within which store_enos_rec() is called these 20K
times, then the commit() is called on the outside of that, or at least
every 1000 records or so.

Also, the "create if not present" pattern is often much better set up by
preloading all the objects you'll deal with up front:

existing_evs = dict(Session.query(EnosValues.enos_id, EnosValues))
if ev_id not in existing_evs:
    Session.add(EnosValues(...))


If you truly need to optimize a method whose job it is to start a new
transaction, insert a single row if not present and then commit the
transaction fully, I'd use a core insert() construct for that.  
Spinning up a whole unit of work in that context is overkill if
performance is a concern.



-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] Performance problem of simple orm usage

Reply via email to