Mike, We've upgraded to that latest and greatest SQLAlchemy, did some instrumentation and confirmed we were getting bit in _fast_discard.
Thank you for your very prompt responses, we really appreciate your help. Cecil Rock On Mon, Feb 26, 2018 at 6:23 PM, Mike Bayer <mike...@zzzcomputing.com> wrote: > On Mon, Feb 26, 2018 at 4:51 PM, Cecil Rock <cecil.r...@gmail.com> wrote: > > Hi Mike, > > > > Another colleague of mine sent me this write up which we believe is the > root > > cause of our issue. > > > > In sqlalchemy/orm/loading.py: > > > > > > > > Function _instance() defined in _instance_processor() checks if > identitykey > > is in session_identity_map by calling session_identity_map.get( > identitykey). > > This will return None if identitykey is in the map but the corresponding > > state has no obj(). _instance() then proceeds to add the new_instance to > > session_identity_map via session_identity_map._add_unpresent(). This > will > > replace the existing state in identity_map with the new one. If the > replaced > > state has a weakref to the identity map then when the replaced state is > > garbage collected the weakref call back will incorrectly remove the new > > state from the instance_map. If the new state is dirty and also in > > instance_map._modified then a subsequent flush can blow up in the > > flush_guard because the session appears dirty yet there is nothing to > flush. > > > > > > We think your fix in https://bitbucket.org/zzzeek/sqlalchemy/issues/4068 > > will actually address this by avoiding cleaning up the wrong key. Does > this > > make sense to you as well? If so we think we can just jump to 1.2.4. > > yes that is exactly the explanation for how that would work. > > I just can't figure out a way to simulate that race exactly. but yes > that's the mechanism. > > > > > > > > Thanks, > > Cecil > > > > > > > > On Mon, Feb 26, 2018 at 1:18 PM, Mike Bayer <mike...@zzzcomputing.com> > > wrote: > >> > >> I can't get it to simulate your issue though. but the test case on > >> 4068 is pretty complicated and I don't remember how that one works > >> either, but it looks a lot like the safer checks post 4068 may fix the > >> issue. > >> > >> > >> > >> On Mon, Feb 26, 2018 at 4:06 PM, Cecil Rock <cecil.r...@gmail.com> > wrote: > >> > OH, man sorry I misunderstood that issue. I have been starring at > this > >> > for > >> > too long :) > >> > > >> > Yes, that totally could be it. I'll apply that patch and should be > able > >> > to > >> > confirm it or not. > >> > > >> > On Mon, Feb 26, 2018 at 1:02 PM, Mike Bayer <mike...@zzzcomputing.com > > > >> > wrote: > >> >> > >> >> On Mon, Feb 26, 2018 at 3:58 PM, Cecil Rock <cecil.r...@gmail.com> > >> >> wrote: > >> >> > Yeah I saw those issues, but they did not seem like they would be > >> >> > related to > >> >> > what we're seeing :( > >> >> > > >> >> > I had instrumented the identity_map earlier and could not catch an > >> >> > instance > >> >> > where it was doing the wrong thing. The GC being guilty or maybe > an > >> >> > obscure > >> >> > bug higher in the stack feels more likely. It is helpful to know > >> >> > that a > >> >> > bug > >> >> > doesn't come to mind with this. I'll see if I can mess with the GC > >> >> > and > >> >> > run > >> >> > that to ground or not. If you have any other ideas they are much > >> >> > appreciated :) > >> >> > >> >> issue #4068 looks *very* likely to be related to what you are seeing. > >> >> I can't get your situation to reproduce with it however it is exactly > >> >> something that, in response to a garbage collection event, will > remove > >> >> the *wrong* object from identity_map._dict and *keep* it in > >> >> identity_map._modified exactly as you are seeing. i might be > able > >> >> to reproduce this simulating a GC against the object at a very > >> >> specific time during the load. > >> >> > >> >> > >> >> > > >> >> > On Mon, Feb 26, 2018 at 12:46 PM, Mike Bayer > >> >> > <mike...@zzzcomputing.com> > >> >> > wrote: > >> >> >> > >> >> >> On Mon, Feb 26, 2018 at 3:33 PM, Cecil Rock <cecil.r...@gmail.com > > > >> >> >> wrote: > >> >> >> > Fairly old, v1.0.14. Its running in one of our older > >> >> >> > microservices. > >> >> >> > Once I > >> >> >> > understood what was going on, I went through the fix list and > >> >> >> > could > >> >> >> > not > >> >> >> > find > >> >> >> > anything obviously related to it. We can't reproduce this at > will > >> >> >> > and > >> >> >> > nothing obvious has changed, so we're worried that upgrading > >> >> >> > without > >> >> >> > understanding could just mask the issue if it exists. > >> >> >> > >> >> >> you are before the fix for > >> >> >> https://bitbucket.org/zzzeek/sqlalchemy/issues/4030 as well as > >> >> >> https://bitbucket.org/zzzeek/sqlalchemy/issues/4068 which both > allow > >> >> >> the identity_map.discard(), safe_discard() and _fast_discard() > >> >> >> methods > >> >> >> to be safer against race conditions. you'd want to get up to > >> >> >> version 1.1.14 to get away from each of these issues. > >> >> >> > >> >> >> there's a lot of ways I can break certain things to produce this > >> >> >> error > >> >> >> on 1.0.14 but that same breakage still causes different errors > >> >> >> (because I'm purposely screwing up the internal state of things) > in > >> >> >> later versions so I don't think I've isolated an exact race that > can > >> >> >> cause this. > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > > >> >> >> > On Monday, February 26, 2018 at 12:28:49 PM UTC-8, Mike Bayer > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> um what version of SQLAlchemy is this? > >> >> >> >> > >> >> >> >> On Mon, Feb 26, 2018 at 3:22 PM, Cecil Rock < > cecil...@gmail.com> > >> >> >> >> wrote: > >> >> >> >> > One of my thoughts was maybe the gc wiped out the > identity_map > >> >> >> >> > and > >> >> >> >> > it > >> >> >> >> > gets > >> >> >> >> > recreated empty. > >> >> >> >> > > >> >> >> >> > Here is a stack trace of when I detect the situation prior to > >> >> >> >> > the > >> >> >> >> > flush > >> >> >> >> > occurring. > >> >> >> >> > > >> >> >> >> >> [4018/eval_0/INFO/state/554/_modified_event] fc3a3a adding > >> >> >> >> >> <sqlalchemy.orm.state.InstanceState object at > 0x7f6a5bb989e8> > >> >> >> >> >> as > >> >> >> >> >> modified > >> >> >> >> >> inSession:False > >> >> >> >> >> [4018/eval_0/ERROR/evaluate/213/runEval] Evaluate job > 4c4bdfaa > >> >> >> >> >> caught > >> >> >> >> >> exception, Oh Snap (Traceback (most recent call last): > >> >> >> >> >> File "/home/cecil/monkey/server/ > app/evaluate/evaluate.py", > >> >> >> >> >> line > >> >> >> >> >> 207, > >> >> >> >> >> in > >> >> >> >> >> runEval > >> >> >> >> >> _evaluateMovie(params) > >> >> >> >> >> File "/home/cecil/monkey/server/ > app/evaluate/evaluate.py", > >> >> >> >> >> line > >> >> >> >> >> 196, > >> >> >> >> >> in > >> >> >> >> >> _evaluateMovie > >> >> >> >> >> statusDetails=evalDetails) > >> >> >> >> >> File > >> >> >> >> >> "/home/cecil/monkey/server/app/metrics/metricsUtils.py", > >> >> >> >> >> line > >> >> >> >> >> 230, > >> >> >> >> >> in updateRecordableEvent > >> >> >> >> >> event.statusDetails = statusDetails[0:MAX_STATUS_ > DETAILS] > >> >> >> >> >> File > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/attributes.py", > >> >> >> >> >> line 224, in __set__ > >> >> >> >> >> instance_dict(instance), value, None) > >> >> >> >> >> File > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/attributes.py", > >> >> >> >> >> line 702, in set > >> >> >> >> >> state._modified_event(dict_, self, old) > >> >> >> >> >> File > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> > >> >> >> >> >> "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/state.py", > >> >> >> >> >> line 556, in _modified_event > >> >> >> >> >> raise Exception('Oh Snap') > >> >> >> >> >> Exception: Oh Snap > >> >> >> >> >> ) > >> >> >> >> >> > >> >> >> >> >> This is what I'm looking at in _modified_event, this occurs > >> >> >> >> >> after > >> >> >> >> >> the > >> >> >> >> >> sleep in the earlier example. > >> >> >> >> > > >> >> >> >> > instance_dict = self._instance_dict() > >> >> >> >> > if instance_dict: > >> >> >> >> > # ===DEBUGGING=== > >> >> >> >> > if isEvent: # just check objects of interest > >> >> >> >> > inSession = > >> >> >> >> > instance_dict.contains_state(self) > >> >> >> >> > logger.info('%s adding %s as modified > >> >> >> >> > inSession:%s', > >> >> >> >> > msgNum, self, inSession) > >> >> >> >> > if not inSession: > >> >> >> >> > raise Exception('Oh Snap') > >> >> >> >> > > >> >> >> >> > instance_dict._modified.add(self) > >> >> >> >> > > >> >> >> >> > Here is the stack trace if I don't catch this early > >> >> >> >> > > >> >> >> >> > [8370/eval_4/ERROR/evaluate/213/runEval] Evaluate job > d498ba17 > >> >> >> >> > caught > >> >> >> >> > exception, Over 100 subsequent flushes have occurred within > >> >> >> >> > session.commit() > >> >> >> >> > - is an after_flush() hook creating new objects? (Traceback > >> >> >> >> > (most > >> >> >> >> > recent > >> >> >> >> > call last): > >> >> >> >> > File "/home/cecil/monkey/server/app/evaluate/evaluate.py", > >> >> >> >> > line > >> >> >> >> > 207, > >> >> >> >> > in > >> >> >> >> > runEval > >> >> >> >> > _evaluateMovie(params) > >> >> >> >> > File "/home/cecil/monkey/server/app/evaluate/evaluate.py", > >> >> >> >> > line > >> >> >> >> > 196, > >> >> >> >> > in > >> >> >> >> > _evaluateMovie > >> >> >> >> > statusDetails=evalDetails) > >> >> >> >> > File "/home/cecil/monkey/server/ > app/metrics/metricsUtils.py", > >> >> >> >> > line > >> >> >> >> > 288, in > >> >> >> >> > updateRecordableEvent > >> >> >> >> > addNotifEvent(event, > >> >> >> >> > workflowSequence=jobParams.get('workflowSequence')) > >> >> >> >> > File > >> >> >> >> > "/home/cecil/monkey/server/app/notification/notifUtils.py", > >> >> >> >> > line > >> >> >> >> > 93, > >> >> >> >> > in addNotifEvent > >> >> >> >> > db.session.commit() > >> >> >> >> > File > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/scoping.py", > >> >> >> >> > line 157, in do > >> >> >> >> > return getattr(self.registry(), name)(*args, **kwargs) > >> >> >> >> > File > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/session.py", > >> >> >> >> > line 822, in commit > >> >> >> >> > self.transaction.commit() > >> >> >> >> > File > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/session.py", > >> >> >> >> > line 400, in commit > >> >> >> >> > self._prepare_impl() > >> >> >> >> > File > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > "/home/cecil/monkey/server/venv/lib/python3.4/site- > packages/sqlalchemy/orm/session.py", > >> >> >> >> > line 383, in _prepare_impl > >> >> >> >> > "Over 100 subsequent flushes have occurred within " > >> >> >> >> > sqlalchemy.orm.exc.FlushError: Over 100 subsequent flushes > have > >> >> >> >> > occurred > >> >> >> >> > within session.commit() - is an after_flush() hook creating > new > >> >> >> >> > objects? > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > -- > >> >> >> >> > SQLAlchemy - > >> >> >> >> > The Python SQL Toolkit and Object Relational Mapper > >> >> >> >> > > >> >> >> >> > http://www.sqlalchemy.org/ > >> >> >> >> > > >> >> >> >> > To post example code, please provide an MCVE: Minimal, > >> >> >> >> > Complete, > >> >> >> >> > and > >> >> >> >> > Verifiable Example. See http://stackoverflow.com/help/mcve > for > >> >> >> >> > a > >> >> >> >> > full > >> >> >> >> > description. > >> >> >> >> > --- > >> >> >> >> > You received this message because you are subscribed to the > >> >> >> >> > Google > >> >> >> >> > Groups > >> >> >> >> > "sqlalchemy" group. > >> >> >> >> > To unsubscribe from this group and stop receiving emails from > >> >> >> >> > it, > >> >> >> >> > send > >> >> >> >> > an > >> >> >> >> > email to sqlalchemy+...@googlegroups.com. > >> >> >> >> > To post to this group, send email to > sqlal...@googlegroups.com. > >> >> >> >> > Visit this group at https://groups.google.com/ > group/sqlalchemy. > >> >> >> >> > For more options, visit https://groups.google.com/d/optout. > >> >> >> > > >> >> >> > -- > >> >> >> > SQLAlchemy - > >> >> >> > The Python SQL Toolkit and Object Relational Mapper > >> >> >> > > >> >> >> > http://www.sqlalchemy.org/ > >> >> >> > > >> >> >> > To post example code, please provide an MCVE: Minimal, Complete, > >> >> >> > and > >> >> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for > a > >> >> >> > full > >> >> >> > description. > >> >> >> > --- > >> >> >> > You received this message because you are subscribed to the > Google > >> >> >> > Groups > >> >> >> > "sqlalchemy" group. > >> >> >> > To unsubscribe from this group and stop receiving emails from > it, > >> >> >> > send > >> >> >> > an > >> >> >> > email to sqlalchemy+unsubscr...@googlegroups.com. > >> >> >> > To post to this group, send email to > sqlalchemy@googlegroups.com. > >> >> >> > Visit this group at https://groups.google.com/group/sqlalchemy. > >> >> >> > For more options, visit https://groups.google.com/d/optout. > >> >> >> > >> >> >> -- > >> >> >> SQLAlchemy - > >> >> >> The Python SQL Toolkit and Object Relational Mapper > >> >> >> > >> >> >> http://www.sqlalchemy.org/ > >> >> >> > >> >> >> To post example code, please provide an MCVE: Minimal, Complete, > and > >> >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for > a > >> >> >> full > >> >> >> description. > >> >> >> --- > >> >> >> You received this message because you are subscribed to a topic in > >> >> >> the > >> >> >> Google Groups "sqlalchemy" group. > >> >> >> To unsubscribe from this topic, visit > >> >> >> > >> >> >> https://groups.google.com/d/topic/sqlalchemy/IcgPmdW6lz8/ > unsubscribe. > >> >> >> To unsubscribe from this group and all its topics, send an email > to > >> >> >> sqlalchemy+unsubscr...@googlegroups.com. > >> >> >> To post to this group, send email to sqlalchemy@googlegroups.com. > >> >> >> Visit this group at https://groups.google.com/group/sqlalchemy. > >> >> >> For more options, visit https://groups.google.com/d/optout. > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > - Cecil > >> >> > > >> >> > -- > >> >> > SQLAlchemy - > >> >> > The Python SQL Toolkit and Object Relational Mapper > >> >> > > >> >> > http://www.sqlalchemy.org/ > >> >> > > >> >> > To post example code, please provide an MCVE: Minimal, Complete, > and > >> >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a > full > >> >> > description. > >> >> > --- > >> >> > You received this message because you are subscribed to the Google > >> >> > Groups > >> >> > "sqlalchemy" group. > >> >> > To unsubscribe from this group and stop receiving emails from it, > >> >> > send > >> >> > an > >> >> > email to sqlalchemy+unsubscr...@googlegroups.com. > >> >> > To post to this group, send email to sqlalchemy@googlegroups.com. > >> >> > Visit this group at https://groups.google.com/group/sqlalchemy. > >> >> > For more options, visit https://groups.google.com/d/optout. > >> >> > >> >> -- > >> >> SQLAlchemy - > >> >> The Python SQL Toolkit and Object Relational Mapper > >> >> > >> >> http://www.sqlalchemy.org/ > >> >> > >> >> To post example code, please provide an MCVE: Minimal, Complete, and > >> >> Verifiable Example. See http://stackoverflow.com/help/mcve for a > full > >> >> description. > >> >> --- > >> >> You received this message because you are subscribed to a topic in > the > >> >> Google Groups "sqlalchemy" group. > >> >> To unsubscribe from this topic, visit > >> >> https://groups.google.com/d/topic/sqlalchemy/IcgPmdW6lz8/unsubscribe > . > >> >> To unsubscribe from this group and all its topics, send an email to > >> >> sqlalchemy+unsubscr...@googlegroups.com. > >> >> To post to this group, send email to sqlalchemy@googlegroups.com. > >> >> Visit this group at https://groups.google.com/group/sqlalchemy. > >> >> For more options, visit https://groups.google.com/d/optout. > >> > > >> > > >> > > >> > > >> > -- > >> > - Cecil > >> > > >> > -- > >> > SQLAlchemy - > >> > The Python SQL Toolkit and Object Relational Mapper > >> > > >> > http://www.sqlalchemy.org/ > >> > > >> > To post example code, please provide an MCVE: Minimal, Complete, and > >> > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > >> > description. > >> > --- > >> > You received this message because you are subscribed to the Google > >> > Groups > >> > "sqlalchemy" group. > >> > To unsubscribe from this group and stop receiving emails from it, send > >> > an > >> > email to sqlalchemy+unsubscr...@googlegroups.com. > >> > To post to this group, send email to sqlalchemy@googlegroups.com. > >> > Visit this group at https://groups.google.com/group/sqlalchemy. > >> > For more options, visit https://groups.google.com/d/optout. > >> > >> -- > >> SQLAlchemy - > >> The Python SQL Toolkit and Object Relational Mapper > >> > >> http://www.sqlalchemy.org/ > >> > >> To post example code, please provide an MCVE: Minimal, Complete, and > >> Verifiable Example. See http://stackoverflow.com/help/mcve for a full > >> description. > >> --- > >> You received this message because you are subscribed to a topic in the > >> Google Groups "sqlalchemy" group. > >> To unsubscribe from this topic, visit > >> https://groups.google.com/d/topic/sqlalchemy/IcgPmdW6lz8/unsubscribe. > >> To unsubscribe from this group and all its topics, send an email to > >> sqlalchemy+unsubscr...@googlegroups.com. > >> To post to this group, send email to sqlalchemy@googlegroups.com. > >> Visit this group at https://groups.google.com/group/sqlalchemy. > >> For more options, visit https://groups.google.com/d/optout. > > > > > > > > > > -- > > - Cecil > > > > -- > > SQLAlchemy - > > The Python SQL Toolkit and Object Relational Mapper > > > > http://www.sqlalchemy.org/ > > > > To post example code, please provide an MCVE: Minimal, Complete, and > > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > > description. > > --- > > You received this message because you are subscribed to the Google Groups > > "sqlalchemy" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to sqlalchemy+unsubscr...@googlegroups.com. > > To post to this group, send email to sqlalchemy@googlegroups.com. > > Visit this group at https://groups.google.com/group/sqlalchemy. > > For more options, visit https://groups.google.com/d/optout. > > -- > SQLAlchemy - > The Python SQL Toolkit and Object Relational Mapper > > http://www.sqlalchemy.org/ > > To post example code, please provide an MCVE: Minimal, Complete, and > Verifiable Example. See http://stackoverflow.com/help/mcve for a full > description. > --- > You received this message because you are subscribed to a topic in the > Google Groups "sqlalchemy" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/sqlalchemy/IcgPmdW6lz8/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > sqlalchemy+unsubscr...@googlegroups.com. > To post to this group, send email to sqlalchemy@googlegroups.com. > Visit this group at https://groups.google.com/group/sqlalchemy. > For more options, visit https://groups.google.com/d/optout. > -- - Cecil -- SQLAlchemy - The Python SQL Toolkit and Object Relational Mapper http://www.sqlalchemy.org/ To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description. --- You received this message because you are subscribed to the Google Groups "sqlalchemy" group. To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+unsubscr...@googlegroups.com. To post to this group, send email to sqlalchemy@googlegroups.com. Visit this group at https://groups.google.com/group/sqlalchemy. For more options, visit https://groups.google.com/d/optout.