Re: [sqlalchemy] Re: Working with large IN lists

Vlad K. Tue, 21 Feb 2012 17:30:56 -0800


Thanks for your replies.

Using the IN list definitely speeds up the process, but I hate theresulting query which uses bound variables for each and every element ofthe list.

But I have another problem with this, there's a massive memory leaksomewhere. Take a look at this model:




class GatewayTransaction(Base):
    __tablename__ = "gateway_transactions"

    realestate_id = Column(Integer, ForeignKey("realestate.realestate_id",
                                               ondelete="set null",

onupdate="cascade"),primary_key=True)

    portal_id = Column(Text, primary_key=True)
    realestate_portal_id = Column(Unicode)
    operation = Column(Text, nullable=False)
    agency_id = Column(Integer, ForeignKey("agencies.agency_id",
                                            ondelete="set null",

onupdate="cascade"),nullable=False)

    agency_portal_id = Column(Unicode, nullable=False)
    agency_export_token = Column(Unicode, nullable=False)
    user_id = Column(Integer, ForeignKey("users.user_id",
                                         ondelete="set null",

onupdate="cascade"),nullable=False)

    mod_images = Column(Boolean)

    agency = relationship("Agency", lazy="joined")
    realestate = relationship("Realestate", lazy="joined")
    user = relationship("User", lazy="joined")




Now, when I do this:


for row in some_query.all():
                    gt = session.query(GatewayTransaction)\

.filter(GatewayTransaction.realestate_id==row.realestate_id)\

                                .filter(GatewayTransaction.portal_id==k)\
                                .first() or GatewayTransaction()

                    # Do some data processing
                    #
                    #


                    # Update existing or insert as new
                    gt = session.merge(gt)
                    session.flush()

It is very, very slow, it takes minutes to process 2000 rows and memoryusage skyrockets into multiple GB range and I have to terminate itbefore it starts swapping like hell. With lazy="select", it flies fast,done in a couple of seconds with very little memory consumed, because atthis point there are no rows in the table so nothing is additionallyselected, instead inserted. Still, why would a join slow things down sodrastically and shoot Python memory usage (not DB's) skyhigh?

Also, even if I try session.expunge(gt) or expunge_all() (previouslypreparing the "row" to be loaded one by one from a list of IDs), thememory always keeps growing, as if the instance do not die, never getgarbage collected...




.oO V Oo.


On 02/21/2012 04:35 PM, Simon King wrote:


A *long* time ago (SQLALchemy 0.3), I had some performance problems
with large "IN" clauses, and the reason turned out to be SQLAlchemy
taking a lot of time to build long lists of bindparam objects. I've no
idea if this is still the case these days. The best thing you can do
is just try it.

Simon


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Re: [sqlalchemy] Re: Working with large IN lists

Reply via email to