> On 31 Dec 2014, at 16:25, Mehdi <mese1...@gmail.com> wrote: > > Hi > What would be the most efficient way to loop over a large table(>120000 rows) > and base on some conditions find the match on another table(>70000 rows)? > Actually i'm working on a given db which there are TableA with 14 fields and > more than 120000 rows and TableB with 150 fields and about 70000 rows. now i > have to find a match for each row of TableA on TableB and then put value of > TableB.id in a field on TableA(let's say TableA.b_id). > Currently i'm doing it with a loop through TableA with offset/limit and then > inside the loop i try to find the match by getting query of TableB: > offset = 0 > limit = 2000 > count = session.query(func.count(TableA.id)).scalar() > while offset < count: > for a_row in session.query(TableA).offset(offset).limit(limit): > match = session.query(TableB).filter(some_filters).all() > if len(match) == 1: > a_row.b_id = match[0].id > session.add(a_row) > else: > ... > offset += limit > > But this is pretty time consuming process even though i set deferred columns > as much as possible. > I wonder is it possible and reliable to get all matches by a join subquery > and then loop over those result to set the match field? > Thanks.
Perhaps you could do something like this: batch_size = 2000 while True: batch = (session.query(TableA, TableB) .join(<match-conditions>) .filter(TableA.b_id == None) .limit(batch_size) .all()) if not batch: break for a_row, b_row in batch: a_row.b_id = b_row.id session.commit() Hope that helps, Simon -- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at http://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.