On 6/26/14, 11:49 AM, Hans-Peter Jansen wrote: > Dear SQLAchemistas, > > I'm suffering from a performance problem from a simple sequence like this: > > rec is a sensor record coming "redundantly" from the network, enos is a > sensor device, ev is a sensor value record: > > def store_enos_rec(self, rec, srelay): > enos = self.lookup_enos(rec.id, srelay) > ts = rec.ts.replace(tzinfo = None) > ev = self.session.query(EnOSValues).filter(EnOSValues.enos_id == > enos.id, > EnOSValues.ts == > ts).first() > if ev: > return > # create record > ev = EnOSValues(enos_id = enos.id, ts = ts, status = True, temp = > rec.temp) > enos.values.append(ev) > try: > self.session.commit() > except IntegrityError: > self.session.rollback() > > This is running as a multiprocessing.Process on its own, but the > self.session.commit call is the bottleneck here, as shown in the profile > callgraph. Is there something simple, that can be done to mitigate the > issue, other than doing it straight down the sql way? > > Obviously, it is taking a lot of time in history mgmt. A former approach > to simply dump the record without searching duplicates and catching the > IntegrityError took even more time due to the session rollback. > > Thanks for your patience and any enlightenment, It's very difficult to read this SVG and a traditional Python profile output sorted by calltime would be helpful, but what I can see is that you're calling commit() 20K times, so that is a lot. commit() is not intended to be something you call for every individual row of a large dataset, it is better that you feed a few thousand objects into the Session at a time before flushing and committing.
I'm not sure what kind of application this is but I would not be mixing transactional control, that is the commit/rollback, inside of a business method that only seeks to create some new objects. There'd be a containing pattern within which store_enos_rec() is called these 20K times, then the commit() is called on the outside of that, or at least every 1000 records or so. Also, the "create if not present" pattern is often much better set up by preloading all the objects you'll deal with up front: existing_evs = dict(Session.query(EnosValues.enos_id, EnosValues)) if ev_id not in existing_evs: Session.add(EnosValues(...)) If you truly need to optimize a method whose job it is to start a new transaction, insert a single row if not present and then commit the transaction fully, I'd use a core insert() construct for that. Spinning up a whole unit of work in that context is overkill if performance is a concern. -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.