Ed Rahn <edsr...@gmail.com> wrote:
> > On 01/10/2015 04:04 PM, Michael Bayer wrote: >> Ed Rahn <edsr...@gmail.com> wrote: >> >>> On 01/10/2015 01:51 PM, Jonathan Vanasco wrote: >>>> It's hard to tell what could cause the issue: >>>> >>>> How many horses are there? >>> 463827 >>>> What is going on in that other function? >>> Just a regex search, two selects and a commit >>>> Are there foreign key checks involved with the commit? >>> Yep >>> >>> >>> When I set up the DB to log all queries, I can see the commit happening in >>> less than a second. So I'm assured it's somewhere between my function and >>> the DB. >>> >>>> If you want to break a huge batch into smaller commits, there is the >>>> `Windowed Range Query` strategy -- >>>> https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/WindowedRangeQuery >>> That's kinda what I'm doing right now, but it just seems so hacky. I'd like >>> to find a solution for the problem. From that page "that's too large to >>> fetch all at once", I have plenty of memory, how can I tell SQLA or >>> psycopg2 to load all the results into memory? >> first off, psycopg2 loads everything into memory at once in all cases, as >> does the SQLAlchemy ORM, unless specific steps are taken to not do this. >> For 460000 objects, this will be *very* slow, that’s a lot of rows to turn >> into fully persisted and session-tracked Python objects, and in Python even >> allocating the memory to hold them all will build up palpable time. >> >> However, this is all before that commit(), which is where you claim the >> slowness is. This seems doubtful, but if it really is in the commit() and >> not just the fetch, then something else would be going on. If you have >> 400000 objects in memory, and they are all clean, session.commit() is not >> slow at all. Unless, you have some kind of event going on that is tracking >> them all, or you’re using a very old version of SQLAlchemy with mutable >> types, or something else that I could only guess (and I try not to spend >> time guessing on this list). >> >> So before you begin trying to solve a problem, it’s best to understand the >> nature of the problem, so I’d suggest walking through the steps in >> SQLAlchemy’s own frequently asked questions (as this is certainly a question >> that is frequently asked, make no mistake!) "How can I profile a SQLAlchemy >> powered application?” at >> http://docs.sqlalchemy.org/en/rel_0_9/faq/performance.html#how-can-i-profile-a-sqlalchemy-powered-application. >> This is the very best list of steps to fully analyze where things are >> going wrong. In your case I think something is going to stand out under >> code profiling. It may very well be something SQLAlchemy is doing that is >> taking too long; if we identify it, we can talk about strategies to mitigate >> whatever it might be (there are always ways to mitigate). > I didn't get a chance to look at this until now. However the commit after the > select never returns, so I don't know why that changed. > > Profiling the code says that the most time is taken in _expire in state.py > https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/orm/state.py#L360 > > Where should I go from here? turn off expire_on_commit, and/or reduce the number of objects that are resident in the Session at one time (at least when the commit happens). -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.