Hello world,

tl;dr: I would like to have the ability to expire objects from the
session but only if they aren't new/dirty/deleted.

Have a look at this example code:

import random

from sqlalchemy import Column, Integer, String, ForeignKey, create_engine
from sqlalchemy.orm import relationship, backref, sessionmaker
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

lorem_ipsum = """Lorem ipsum dolor sit amet, consectetur [...]""" * 42class 
Change(Base):
    __tablename__ = "change" id = Column(Integer, primary_key=True)
    predecessor_id = Column(Integer, ForeignKey("change.id"))
    comment = Column(String)

    predecessor = relationship("Change", foreign_keys=[predecessor_id], 
uselist=False)

    def __repr__(self):
        return "<{0} #{1}: {2}>".format(self.__class__.__name__, self.id, 
self.comment)

engine = create_engine("sqlite:///demo.db", echo=False)
Base.metadata.create_all(engine)

session = sessionmaker(engine)()

current_change = Change(comment="Initial commit")
session.add(current_change)

for batch in range(42):
    for x in range(1000):
        current_change = Change(
            predecessor=current_change,
            comment="Squashed bug #{0}\n\n{1}".format(
            random.randrange(0, 1000000), lorem_ipsum)
        )
        session.add(current_change)
    session.flush()

session.commit()

Running this (with the complete lorem ipsum text) on my machine, I get this:

> $ ulimit -v 1024000
> $ python safe_expire.py
> Traceback (most recent call last):
>   File "safe_expire.py", line 61, in <module>
>     random.randrange(0, 1000000), lorem_ipsum)
> MemoryError

Quite obviously the problem here is that all the mapped objects are kept
in memory even after being flushed (and before being committed).
This is just a contrived example to show the effect I had in our
application which (among other things) implements a DVCS on top of
SQLAlchemy (on top of a DB).

Our real life use it not as simple, basically incoming objects are
inserted into the database bottom up (in SQL insertion order, because of
foreign key constraints).
For performance reasons the code processes incoming changes in batches
and tries to keep the objects relevant for relationships and sanity
checking in the database session - for us, session.expire_on_commit is
set to False.

During the batch operation the code would run out of memory, because
nothing is expired automatically. This was a surprise to me, because the
session only keeps a weak reference its mapped objects. It seems like
the relationships between objects keep them alive. As updates are
processed bottom up this basically means that no mapped objects are ever
dropped.

This is easy to fix by manually expiring old entries from the session,
like this:

diff --git a/baseline.py b/baseline.py
index 44e1b7a..df4eb24 100644
--- a/baseline.py
+++ b/baseline.py
@@ -50,7 +50,9 @@ current_change = Change(comment="Initial commit")
 session.add(current_change)
 
 for batch in range(42):
+ old_entries = []
     for x in range(1000):
+ old_entries.append(current_change)
         current_change = Change(
             predecessor=current_change,
             comment="Squashed bug #{0}\n\n{1}".format(
@@ -59,4 +61,7 @@ for batch in range(42):
         session.add(current_change)
     session.flush()
 
+ for entry in old_entries:
+ session.expire(entry)
+
 session.commit()


So this is what I did. This works fine, until somebody (maybe myself)
did changes to the old objects without an intermediate flush.
This was really hard to find, somehow updates never made it to the
database. SQLAlchemy did not even execute the expected SQL.

This made me think about our usage of "session.expire": Mostly, we use
it to tell SQLAlchemy that this is a good time to forget this object in
the expectation that there are no pending changes. I found 13 calls to
session.expire in our code base.

So I would like to replace it with something that gives SQLAlchemy a
hint that it probably is a good time to drop these things.

I came up with this code:

def safe_expire(session, o):
    """ Drop *o* from *session* if it has no pending changes. """ if o in 
session.identity_map._modified:
        return False else:
        session.expire(o)
        return True def safe_expire_all(session):
    """ Drop all objects with no pending changes from *session*. """ id_map = 
session.identity_map
    unmodified = [o for key, o in id_map.iteritems() if not o in 
id_map._modified]

    for o in unmodified:
        session.expire(o)

    return unmodified


Does this look like a safe thing to do? (I wonder about updated
relationships etc.)

Am I alone running into adverse effects of session.expire? Note that the
documentation does not warn about dropping pending changes due to calls
to expire. In hindsight this seems quite obvious, but I did not think
about this when writing that code (because I knew the objects in
question are clean anyway).

Greetings and thanks for any insights,

Torsten

-- 
$---+----1----+----2----+----3----+----4----+----5----+----6----+

SCALE GmbH
Niederlassung Dresden
Torsten Landschoff
Pohlandstraße 19
01309 Dresden

Tel: +49-351-312002-10
Fax: +49-351-312002-29

SCALE GmbH
Registergericht und Sitz: Ingolstadt, HRB 6384
Geschäftsführer: Dr.-Ing. Heiner Müllerschön, Dipl.-Math. Ulrich Franz

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to