Thanks for your replies.

Using the IN list definitely speeds up the process, but I hate the resulting query which uses bound variables for each and every element of the list.

But I have another problem with this, there's a massive memory leak somewhere. Take a look at this model:



class GatewayTransaction(Base):
    __tablename__ = "gateway_transactions"

    realestate_id = Column(Integer, ForeignKey("realestate.realestate_id",
                                               ondelete="set null",
onupdate="cascade"), primary_key=True)
    portal_id = Column(Text, primary_key=True)
    realestate_portal_id = Column(Unicode)
    operation = Column(Text, nullable=False)
    agency_id = Column(Integer, ForeignKey("agencies.agency_id",
                                            ondelete="set null",
onupdate="cascade"), nullable=False)
    agency_portal_id = Column(Unicode, nullable=False)
    agency_export_token = Column(Unicode, nullable=False)
    user_id = Column(Integer, ForeignKey("users.user_id",
                                         ondelete="set null",
onupdate="cascade"), nullable=False)
    mod_images = Column(Boolean)

    agency = relationship("Agency", lazy="joined")
    realestate = relationship("Realestate", lazy="joined")
    user = relationship("User", lazy="joined")




Now, when I do this:


for row in some_query.all():
                    gt = session.query(GatewayTransaction)\
.filter(GatewayTransaction.realestate_id==row.realestate_id)\
                                .filter(GatewayTransaction.portal_id==k)\
                                .first() or GatewayTransaction()

                    # Do some data processing
                    #
                    #


                    # Update existing or insert as new
                    gt = session.merge(gt)
                    session.flush()


It is very, very slow, it takes minutes to process 2000 rows and memory usage skyrockets into multiple GB range and I have to terminate it before it starts swapping like hell. With lazy="select", it flies fast, done in a couple of seconds with very little memory consumed, because at this point there are no rows in the table so nothing is additionally selected, instead inserted. Still, why would a join slow things down so drastically and shoot Python memory usage (not DB's) skyhigh?

Also, even if I try session.expunge(gt) or expunge_all() (previously preparing the "row" to be loaded one by one from a list of IDs), the memory always keeps growing, as if the instance do not die, never get garbage collected...



.oO V Oo.


On 02/21/2012 04:35 PM, Simon King wrote:

A *long* time ago (SQLALchemy 0.3), I had some performance problems
with large "IN" clauses, and the reason turned out to be SQLAlchemy
taking a lot of time to build long lists of bindparam objects. I've no
idea if this is still the case these days. The best thing you can do
is just try it.

Simon


--
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To post to this group, send email to sqlalchemy@googlegroups.com.
To unsubscribe from this group, send email to 
sqlalchemy+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/sqlalchemy?hl=en.

Reply via email to