Re: [sqlalchemy] session.expire{,_all}_unmodified for dropping objects that are unchanged?!

Mike Bayer Wed, 17 Jan 2018 14:23:33 -0800

On Wed, Jan 17, 2018 at 11:06 AM, Torsten Landschoff
<torsten.landsch...@dynamore.de> wrote:
> Hello world,
>
> tl;dr: I would like to have the ability to expire objects from the session
> but only if they aren't new/dirty/deleted.
>
> Have a look at this example code:
>
> import random
>
> from sqlalchemy import Column, Integer, String, ForeignKey, create_engine
> from sqlalchemy.orm import relationship, backref, sessionmaker
> from sqlalchemy.ext.declarative import declarative_base
>
> Base = declarative_base()
>
> lorem_ipsum = """Lorem ipsum dolor sit amet, consectetur [...]""" * 42
>
> class Change(Base):
>     __tablename__ = "change"
>     id = Column(Integer, primary_key=True)
>     predecessor_id = Column(Integer, ForeignKey("change.id"))
>     comment = Column(String)
>
>     predecessor = relationship("Change", foreign_keys=[predecessor_id],
> uselist=False)
>
>     def __repr__(self):
>         return "<{0} #{1}: {2}>".format(self.__class__.__name__, self.id,
> self.comment)
>
> engine = create_engine("sqlite:///demo.db", echo=False)
> Base.metadata.create_all(engine)
>
> session = sessionmaker(engine)()
>
> current_change = Change(comment="Initial commit")
> session.add(current_change)
>
> for batch in range(42):
>     for x in range(1000):
>         current_change = Change(
>             predecessor=current_change,
>             comment="Squashed bug #{0}\n\n{1}".format(
>             random.randrange(0, 1000000), lorem_ipsum)
>         )
>         session.add(current_change)
>     session.flush()
>
> session.commit()
>
> Running this (with the complete lorem ipsum text) on my machine, I get this:
>
> $ ulimit -v 1024000
> $ python safe_expire.py
> Traceback (most recent call last):
>   File "safe_expire.py", line 61, in <module>
>     random.randrange(0, 1000000), lorem_ipsum)
> MemoryError
>
>
> Quite obviously the problem here is that all the mapped objects are kept in
> memory even after being flushed (and before being committed).
> This is just a contrived example to show the effect I had in our application
> which (among other things) implements a DVCS on top of SQLAlchemy (on top of
> a DB).
>
> Our real life use it not as simple, basically incoming objects are inserted
> into the database bottom up (in SQL insertion order, because of foreign key
> constraints).
> For performance reasons the code processes incoming changes in batches and
> tries to keep the objects relevant for relationships and sanity checking in
> the database session - for us, session.expire_on_commit is set to False.
>
> During the batch operation the code would run out of memory, because nothing
> is expired automatically. This was a surprise to me, because the session
> only keeps a weak reference its mapped objects. It seems like the
> relationships between objects keep them alive. As updates are processed
> bottom up this basically means that no mapped objects are ever dropped.
>
> This is easy to fix by manually expiring old entries from the session, like
> this:
>
> diff --git a/baseline.py b/baseline.py
> index 44e1b7a..df4eb24 100644
> --- a/baseline.py
> +++ b/baseline.py
> @@ -50,7 +50,9 @@ current_change = Change(comment="Initial commit")
>  session.add(current_change)
>
>  for batch in range(42):
> +    old_entries = []
>      for x in range(1000):
> +        old_entries.append(current_change)
>          current_change = Change(
>              predecessor=current_change,
>              comment="Squashed bug #{0}\n\n{1}".format(
> @@ -59,4 +61,7 @@ for batch in range(42):
>          session.add(current_change)
>      session.flush()
>
> +    for entry in old_entries:
> +        session.expire(entry)
> +
>  session.commit()
>
>
> So this is what I did.


this is overkill.  Here's what your code looks like without an ORM:

class Link(object):
    def __init__(self, previous=None):
        self.previous = previous


current_link = Link()
print("Current link!  %s" % current_link)

for batch in range(42):
    for x in range(1000):
        current_link = Link(current_link)
        print("Current link!  %s" % current_link)


This code has the exact same problem, without any expire() or anything
like that.    So if you were say writing these objects to a file or
something like that, how would you deal with this loop building up an
enormous chain of un-collectable Link objects?  If it were me, I'd
simply do this:

for batch in range(42):
    for x in range(1000):
        current_link = Link(current_link)
        print("Current link!  %s" % current_link)

        current_link.previous = None

this is very specific to the very quirky and odd thing I'm doing in
the first place, which is that I'm creating an endlessly long linked
list - I detach the element.

The analogue with the ORM would be to expire this specific link, which
we can do each flush so that the entire previous chain goes away:

for batch in range(42):
    for x in range(1000):
        current_change = Change(
            predecessor=current_change,
            comment="Squashed bug #{0}\n\n{1}".format(
            random.randrange(0, 1000000), lorem_ipsum)
        )
        session.add(current_change)
    session.flush()
    session.expire(current_change, ['predecessor'])


> This works fine, until somebody (maybe myself) did
> changes to the old objects without an intermediate flush.
> This was really hard to find, somehow updates never made it to the database.
> SQLAlchemy did not even execute the expected SQL.

I think because the mass "expire" of everything is too coarse-grained
and distant from the loop to catch its dependency on the structure of
the loop.

>
> This made me think about our usage of "session.expire": Mostly, we use it to
> tell SQLAlchemy that this is a good time to forget this object in the
> expectation that there are no pending changes. I found 13 calls to
> session.expire in our code base.

The way that loop is structured, I would not want to expire all
attributes that have been flushed.   When you make a new Change(), it
has a predecessor for which it needs to locate its primary key in
order to assemble it into the foreign key column of the successor.  If
you've expired that Change() object completely, you are forcing the
ORM to un-expire the object using another SELECT to ensure it hasn't
been DELETEd, so that it can use the Change.id value.  The more
specific expiry just of the predecessor of that Change, which we don't
need anymore, breaks the long memory cycle yet allows the object to
serve as a predecessor for the next object without having to refresh
it.

>
> So I would like to replace it with something that gives SQLAlchemy a hint
> that it probably is a good time to drop these things.
>
> I came up with this code:
>
> def safe_expire(session, o):
>     """
>     Drop *o* from *session* if it has no pending changes.
>     """
>     if o in session.identity_map._modified:
>         return False
>     else:
>         session.expire(o)
>         return True

you can tell if an object is part of that "modified" collection using
instance_state(obj).modified - if that's false, it's definitely clean.
_modified isn't public.  You can also look inside of session.dirty.
session.dirty can return false positives (object seems dirty when it
really isn't) but won't give you false negatives (object is not in
session.dirty, it is definitely clean as long as it's not part of
".new" or ".deleted").    But the most in-depth method,
session.is_modified(), does that state.modified check first before
digging in, so since you can likely tolerate false positives you can
stick with that.

>
> def safe_expire_all(session):
>     """
>     Drop all objects with no pending changes from *session*.
>     """
>     id_map = session.identity_map
>     unmodified = [o for key, o in id_map.iteritems() if not o in
> id_map._modified]
>
>     for o in unmodified:
>         session.expire(o)
>
>     return unmodified
>
>
> Does this look like a safe thing to do? (I wonder about updated
> relationships etc.)

it's "safe" but as before if the object is involved in other objects
still to be flushed, a new SELECT will need to be emitted to unexpire
the object.


>
> Am I alone running into adverse effects of session.expire?

the case you illustrate is surprising and unusual.    I would mark it
up to a specific case you have here where you happen to be building up
an endlessly long chain of objects for which the issue of memory is
most cleanly handled by an explicit periodic break of the chain.


> Note that the
> documentation does not warn about dropping pending changes due to calls to
> expire. In hindsight this seems quite obvious, but I did not think about
> this when writing that code (because I knew the objects in question are
> clean anyway).

it's mentioned here:
http://docs.sqlalchemy.org/en/latest/orm/session_state_management.html#session-expire
but just not in any kind of boldface:

"Another key behavior of both expire() and refresh() is that all
un-flushed changes on an object are discarded. "

I actually probably wouldn't make expire() do what it does today, e.g.
expire un-flushed changes.   When expire() was first created it was
likely not possible or very easy to hit only DB-persisted attributes.
 The attribute history system was rewritten several times since then
and it probably would be pretty simple today to add a flag expire(obj,
persistent_only=True) something like that.   But it's one of those
things where when someone needs that, I want to see why, because I
think there's usually another way to handle the situation.



>
> Greetings and thanks for any insights,
>
> Torsten
>
> --
> $---+----1----+----2----+----3----+----4----+----5----+----6----+
>
> SCALE GmbH
> Niederlassung Dresden
> Torsten Landschoff
> Pohlandstraße 19
> 01309 Dresden
>
> Tel: +49-351-312002-10
> Fax: +49-351-312002-29
>
> SCALE GmbH
> Registergericht und Sitz: Ingolstadt, HRB 6384
> Geschäftsführer: Dr.-Ing. Heiner Müllerschön, Dipl.-Math. Ulrich Franz
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+unsubscr...@googlegroups.com.
> To post to this group, send email to sqlalchemy@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to sqlalchemy+unsubscr...@googlegroups.com.
To post to this group, send email to sqlalchemy@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Re: [sqlalchemy] session.expire{,_all}_unmodified for dropping objects that are unchanged?!

Reply via email to