Re: [Zope] Zope 2.8.4 strange behavior
Chris McDonough wrote: The symptom you're describing has happened to me in the past. Zope stops serving requests and my CPU is not pegged (it's not hitting the CPU hard via an infinept loop I've made). I usually end up tracking it down to the fact that my instance is somehow leaking database connections. I thought Jim solved this for 2.8 by not having a hard limit on the connection pool? (iirc, you now get errors in the log when you go over the pool limit) This was 'cos you used to get the crappy situation running unit tests when lots of test fail that things hang and you get no feedback apart from 7 E's because the connection pool is exhausted. That said, I don't know for sure that Jim _did_ end up implementing this, I might just be imagine it. Jim? cheers, Chris -- Simplistix - Content Management, Zope Python Consulting - http://www.simplistix.co.uk ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
[Chris McDonough] The symptom you're describing has happened to me in the past. Zope stops serving requests and my CPU is not pegged (it's not hitting the CPU hard via an infinept loop I've made). I usually end up tracking it down to the fact that my instance is somehow leaking database connections. [Chris Withers] I thought Jim solved this for 2.8 by not having a hard limit on the connection pool? This was implemented for ZODB 3.4. See: http://www.zope.org/Wikis/ZODB/SimplifyConnectionManagement and ZODB NEWS. (iirc, you now get errors in the log when you go over the pool limit) Messages at warning or critical level, depening on how much you've gone over. This was 'cos you used to get the crappy situation running unit tests when lots of test fail that things hang and you get no feedback apart from 7 E's because the connection pool is exhausted. Not really (see the Wiki page referenced above). That said, I don't know for sure that Jim _did_ end up implementing this, I might just be imagine it. It was released in ZODB 3.4a1: What's new in ZODB3 3.4a1? == Release date: 01-Apr-2005 ... DB -- - There is no longer a hard limit on the number of connections that ``DB.open()`` will create. In other words, ``DB.open()`` never blocks anymore waiting for an earlier connection to close, and ``DB.open()`` always returns a connection now (while it wasn't documented, it was possible for ``DB.open()`` to return ``None`` before). ``pool_size`` continues to default to 7, but its meaning has changed: if more than ``pool_size`` connections are obtained from ``DB.open()`` and not closed, a warning is logged; if more than twice ``pool_size``, a critical problem is logged. ``pool_size`` should be set to the maximum number of connections from the ``DB`` instance you expect to have open simultaneously. In addition, if a connection obtained from ``DB.open()`` becomes unreachable without having been explicitly closed, when Python's garbage collection reclaims that connection it no longer counts against the ``pool_size`` thresholds for logging messages. The following optional arguments to ``DB.open()`` are deprecated: ``transaction``, ``waitflag``, ``force`` and ``temporary``. If one is specified, its value is ignored, and ``DeprecationWarning`` is raised. In ZODB 3.6, these optional arguments will be removed. And for ZODB 3.6, they were removed. ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
Paul Winkler wrote at 2005-11-28 15:37 -0500: ... We had to implement a keep alive mechanism to prevent our firewall from behaving in this nasty way. OK. Can you give a high-level summary of what you did? I thought of using heartbeat to detect loss of connection, but I'm not sure what I could do on failure short of restarting Zope. We knew that our firewall shuts down connections with a timeout of 30 min. Thus, we have send our ZEO a keep alive message every 20 min. The code roughly looks like this: KeepPeriod= int(environ.get('ZEO_KEEP_ALIVE')) * 60 Storage = getConfiguration().dbtab.getDatabase('/')._storage def keepAlive(): LOG(CustomZODB,INFO,Keep alive thread started) while 1: sleep(KeepPeriod) if Storage._ready.isSet(): LOG(CustomZODB,INFO,Sending keep alive message) Storage._load_lock.acquire() try: try: Storage._server.get_info() LOG(CustomZODB,INFO,Sent keep alive message) except: LOG(CustomZODB,ERROR, failed, error=exc_info()) finally: Storage._load_lock.release() else: LOG(CustomZODB,PROBLEM,Connection is down) start_new_thread(keepAlive,()) -- Dieter ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
Resolved. Thanks to all that responded. The symptoms were due to an external method which runs a program (htmldoc) that converts html to pdf. The external method uses the old subprocess interface to fire off the process and then wait() for it to complete. To fix another problem, we had upgraded to the latest version, but that htmldoc version apparently is confused by some of our html input and never completes so the external method just sits there waiting for the child process to terminate. Multiple invokations of the blocked htmldoc process eventually brought Zope to its knees and no further progress was made. Confession is good for the soul The problem took a long time to find because failures were infrequent (we only saw three or for instances) so there was limited data. Moreover,the external method the was failing had been in service for almost two years and has not been a problem. And, other very similar external methods which also use htmldoc were known to work with the upgraded htmldoc. It was not until the external method was identified as the only commonality in the failures that I thought to look closely at the external method. And then I noticed the blocked htmldoc processes, etc. etc. The fix Our solution was to revert htmldoc to the earlier version. When time permits we'll add a timeout to the external method and try to extablish what in the current release htmldoc triggered the failure. We probably should move to the new subprocess module rather than use popen2 in the external method. On Sun, 27 Nov 2005, Dennis Allison wrote: Python2.4.2 Zope 2.8.4 ZODB/Zeo 2.3.4 Mysql 4.0 Dual Athalon processors Linux RH7.3 We have two recent instances in our production sites where Zope suddenly stops responding. It is not a new problem, but we've now been confronted with two clean examples and nothing to blame them on. The problem appears to be independent of load as both incidents were on lightly loaded machines. A check of the logs (Linux and Zope) shows nothing obviously amiss except that the trace log (the old -M log) shows a sudden increase in active requests from the typical 0 or 1 to 1300 or more. In this context an active request' is total number of requests pending at the end of this request and is computed by post-processing. We front-end Zope with pound and make heavy use of MySQL. Both show a plethora of incomplete transactions. Examination of the raw trace log shows that Zope is continuing to accept requests, but nothing getting done. The raw log date-stamps four internal states for each transaction. The states are Begin (B), Input (I), action (A), and End (E). Inputs are gathered between B and I, outputs is made between A and E. The raw log shows B and I transactions, but apparently no processing is completing. I suspect that nothing is getting scheduled. I am at a loss as to where to begin to track this one down. The failure is spontaneous and apparently not triggered by any readily distinguishable inputs or pattern of inputs. The behavior smells a bit of resource limits or process synchronization problems, but there is not real evidence for either being the root cause. I am not sure what monitoring I should be doing to help locate the source of the problem. Has anyone seen seen a similar problem? Any advice as to how to proceed? -- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
Paul Winkler wrote at 2005-11-27 21:17 -0500: ... I've seen the same symptoms a number of times recently with zope 2.7.x. In our case, it seems to be related to ZEO. Zope seems to have lost its connection to ZEO but doesn't realize it somehow. My theory is that the symptom starts when all worker threads are waiting for objects that aren't in the ZEO client cache, so they're all waiting on ZEO requests. Do you have a firewall between Zope and ZEO? Usually, the OS can inform both ends of a connection when the connection is torn down. However, some firewalls tear a connection down in a way that the endpoints do not get informed. We had to implement a keep alive mechanism to prevent our firewall from behaving in this nasty way. -- Dieter ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
On Mon, Nov 28, 2005 at 08:19:23PM +0100, Dieter Maurer wrote: Paul Winkler wrote at 2005-11-27 21:17 -0500: ... I've seen the same symptoms a number of times recently with zope 2.7.x. In our case, it seems to be related to ZEO. Zope seems to have lost its connection to ZEO but doesn't realize it somehow. My theory is that the symptom starts when all worker threads are waiting for objects that aren't in the ZEO client cache, so they're all waiting on ZEO requests. Do you have a firewall between Zope and ZEO? Yes, we do, and it is under control of another part of the company :-( Usually, the OS can inform both ends of a connection when the connection is torn down. However, some firewalls tear a connection down in a way that the endpoints do not get informed. I suspected as much... thanks. We had to implement a keep alive mechanism to prevent our firewall from behaving in this nasty way. OK. Can you give a high-level summary of what you did? I thought of using heartbeat to detect loss of connection, but I'm not sure what I could do on failure short of restarting Zope. -- Paul Winkler http://www.slinkp.com ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
On Nov 27, 2005, at 8:13 PM, Dennis Allison wrote: here to begin to track this one down. The failure is spontaneous and apparently not triggered by any readily distinguishable inputs or pattern of inputs. The behavior smells a bit of resource limits or process synchronization problems, but there is not real evidence for either being the root cause. I am not sure what monitoring I should be doing to help locate the source of the problem. Has anyone seen seen a similar problem? Any advice as to how to proceed? A pool of database connections is kept around to pull from for each ZODB database you have. By default, there are 7 of them per database; this number can be changed by the pool-size parameter in the zodb_db section in your config file. These connections are returned to the pool when a connection is closed. This usually happens when a transaction that is using that connection completes. The symptom you're describing has happened to me in the past. Zope stops serving requests and my CPU is not pegged (it's not hitting the CPU hard via an infinept loop I've made). I usually end up tracking it down to the fact that my instance is somehow leaking database connections. The connection is never closed so it never gets returned to the pool. This can happen when there is an error during a transaction commit or abort. Since you've previously posted to the list with commit errors of just this kind, it seems like that would be the logical place to start. You didn't mention seeing any of these errors, though, so if you don't see any of those in the event log, I'm not sure where to go from there other than debugging by binary exclusion which can be impossible when the failures happen so far apart. :-( - C ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
Chris, A helpful suggestion. The commit errors I've been seeing have to do with the intereaction of the ZODB, MySQL, session variables, and conflicts. These particular problems do not appear to be related to the ZODB/session variable/conflict issues, but I cannot completely exclude them since leaks easily cause failures far away from where the fault lies. It certainly is a worthwhile thing to check and monitor. -d On Sun, 27 Nov 2005, Chris McDonough wrote: On Nov 27, 2005, at 8:13 PM, Dennis Allison wrote: here to begin to track this one down. The failure is spontaneous and apparently not triggered by any readily distinguishable inputs or pattern of inputs. The behavior smells a bit of resource limits or process synchronization problems, but there is not real evidence for either being the root cause. I am not sure what monitoring I should be doing to help locate the source of the problem. Has anyone seen seen a similar problem? Any advice as to how to proceed? A pool of database connections is kept around to pull from for each ZODB database you have. By default, there are 7 of them per database; this number can be changed by the pool-size parameter in the zodb_db section in your config file. These connections are returned to the pool when a connection is closed. This usually happens when a transaction that is using that connection completes. The symptom you're describing has happened to me in the past. Zope stops serving requests and my CPU is not pegged (it's not hitting the CPU hard via an infinept loop I've made). I usually end up tracking it down to the fact that my instance is somehow leaking database connections. The connection is never closed so it never gets returned to the pool. This can happen when there is an error during a transaction commit or abort. Since you've previously posted to the list with commit errors of just this kind, it seems like that would be the logical place to start. You didn't mention seeing any of these errors, though, so if you don't see any of those in the event log, I'm not sure where to go from there other than debugging by binary exclusion which can be impossible when the failures happen so far apart. :-( - C -- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
On Sun, Nov 27, 2005 at 05:13:21PM -0800, Dennis Allison wrote: Examination of the raw trace log shows that Zope is continuing to accept requests, but nothing getting done. The raw log date-stamps four internal states for each transaction. The states are Begin (B), Input (I), action (A), and End (E). Inputs are gathered between B and I, outputs is made between A and E. The raw log shows B and I transactions, but apparently no processing is completing. I suspect that nothing is getting scheduled. I've seen the same symptoms a number of times recently with zope 2.7.x. In our case, it seems to be related to ZEO. Zope seems to have lost its connection to ZEO but doesn't realize it somehow. My theory is that the symptom starts when all worker threads are waiting for objects that aren't in the ZEO client cache, so they're all waiting on ZEO requests. Meanwhile, requests keep piling up in the queue. These are all the B and I lines you noticed. Eventually (10 minutes or so) one of these ZEO requests times out, and Zope then realizes it's lost its ZEO connection and successfully reconnects. This shows up in the Zope event log as a series of ZEO Client disconnected errors, and immediately thereafter we see a successful reconnect in the ZEO server's log. I have no idea what is causing the loss of connection in the first place, or why it takes Zope so long to realize there's a problem. Nothing in the logs so far gives me any clue. I've bumped up our log levels to BLATHER and hopefully next time this happens I'll get some more to work with and see if there's anythign I can do about it. It may be network issues between our Zope and Zeo boxes, which are on different subnets with a firewall between them, over which I have no control. -- Paul Winkler http://www.slinkp.com ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
Chris, A helpful suggestion. The commit errors I've been seeing have to do with the intereaction of the ZODB, MySQL, session variables, and conflicts. So the patch that Andy sent over is a fix that prevents the mysql adapter from raising an error when a conflict exception occurs? Do you know if that error only happens after a conflict exception occurs or can it happen without the presence of conflict exceptions? (I'm curious because I also use the adapter and I'd like to know what the patch fixes). These particular problems do not appear to be related to the ZODB/ session variable/conflict issues, but I cannot completely exclude them since leaks easily cause failures far away from where the fault lies. I'm afraid I can't parse that sentence fully. But I'll try to interpret as best possible. ;-) I think I've said this before but it in case not... the use of sessions is only one place where conflict errors can be generated. Conflict errors are normal in any system that causes writes to a ZODB database. If your application does any writes to a ZODB database at all (besides the writes that occur from use of the sessioning machinery), and the mysql adapter wasn't tolerant of conflict errors, you'd be getting the same result, they'd just probably happen further apart. That said, I certainly am interested in making fewer conflicts errors go unresolved when the sessioning machinery is used. If you were to tell that you've taken the steps I've already suggested about reducing the potential for conflicts during session usage (use 2.8 with mvcc, turn external housekeeping on, bump up the resolution time, local zodb db for sessions), and you observed that you're still having too many conflicts, I'd try to take some action, although to do so I might still need to request your help in providing data about your conflict rates. It certainly is a worthwhile thing to check and monitor. What is? Thanks, - C ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
Thanks Paul. I checked and did not find any Client disconnected entries in the event log so this may not be the cause of what I am seeing. I will bump the log level to ALL (which I think includes BLATHER) when I restart things tonight. We use a remote ZEO but at the moment Zope and ZEO share the same machine. On Sun, 27 Nov 2005, Paul Winkler wrote: On Sun, Nov 27, 2005 at 05:13:21PM -0800, Dennis Allison wrote: Examination of the raw trace log shows that Zope is continuing to accept requests, but nothing getting done. The raw log date-stamps four internal states for each transaction. The states are Begin (B), Input (I), action (A), and End (E). Inputs are gathered between B and I, outputs is made between A and E. The raw log shows B and I transactions, but apparently no processing is completing. I suspect that nothing is getting scheduled. I've seen the same symptoms a number of times recently with zope 2.7.x. In our case, it seems to be related to ZEO. Zope seems to have lost its connection to ZEO but doesn't realize it somehow. My theory is that the symptom starts when all worker threads are waiting for objects that aren't in the ZEO client cache, so they're all waiting on ZEO requests. Meanwhile, requests keep piling up in the queue. These are all the B and I lines you noticed. Eventually (10 minutes or so) one of these ZEO requests times out, and Zope then realizes it's lost its ZEO connection and successfully reconnects. This shows up in the Zope event log as a series of ZEO Client disconnected errors, and immediately thereafter we see a successful reconnect in the ZEO server's log. I have no idea what is causing the loss of connection in the first place, or why it takes Zope so long to realize there's a problem. Nothing in the logs so far gives me any clue. I've bumped up our log levels to BLATHER and hopefully next time this happens I'll get some more to work with and see if there's anythign I can do about it. It may be network issues between our Zope and Zeo boxes, which are on different subnets with a firewall between them, over which I have no control. -- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
On Sun, Nov 27, 2005 at 08:08:20PM -0800, Dennis Allison wrote: Thanks Paul. I checked and did not find any Client disconnected entries in the event log so this may not be the cause of what I am seeing. OK. If you were having my problem you'd see them for sure when the problem stops (for me it always fixes itself eventually). I will bump the log level to ALL (which I think includes BLATHER) Yep. when I restart things tonight. We use a remote ZEO but at the moment Zope and ZEO share the same machine. You're reasonably unlikely to have my problem then :-) -- Paul Winkler http://www.slinkp.com ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
We had been plagued with a threading error which manifests itself as an exception reporting release unreleased lock. At Andy's suggestion I added code to catch the exception in the appropriate place in ZMySQLDA -- Here's the code from db.py --- The code probably should be specific to the unreleased lock exception since, presumably, the code could throw other exceptions. On the whole, I think this is a ban-aid solution. def _abort(self, *ignored): try: if self._mysql_lock: self.db.query(SELECT RELEASE_LOCK('%s') % self._mysql_lock) self.db.store_result() if self._transactions: self.db.query(ROLLBACK) self.db.store_result() else: LOG('ZMySQLDA', ERROR, aborting when non-transactional) finally: # just ignore the exception for release unlocked lock try: self._tlock.release() except: pass On Sun, 27 Nov 2005, Chris McDonough wrote: Chris, A helpful suggestion. The commit errors I've been seeing have to do with the intereaction of the ZODB, MySQL, session variables, and conflicts. So the patch that Andy sent over is a fix that prevents the mysql adapter from raising an error when a conflict exception occurs? Do you know if that error only happens after a conflict exception occurs or can it happen without the presence of conflict exceptions? (I'm curious because I also use the adapter and I'd like to know what the patch fixes). These particular problems do not appear to be related to the ZODB/ session variable/conflict issues, but I cannot completely exclude them since leaks easily cause failures far away from where the fault lies. I'm afraid I can't parse that sentence fully. But I'll try to interpret as best possible. ;-) I think I've said this before but it in case not... the use of sessions is only one place where conflict errors can be generated. Conflict errors are normal in any system that causes writes to a ZODB database. If your application does any writes to a ZODB database at all (besides the writes that occur from use of the sessioning machinery), and the mysql adapter wasn't tolerant of conflict errors, you'd be getting the same result, they'd just probably happen further apart. That said, I certainly am interested in making fewer conflicts errors go unresolved when the sessioning machinery is used. If you were to tell that you've taken the steps I've already suggested about reducing the potential for conflicts during session usage (use 2.8 with mvcc, turn external housekeeping on, bump up the resolution time, local zodb db for sessions), and you observed that you're still having too many conflicts, I'd try to take some action, although to do so I might still need to request your help in providing data about your conflict rates. It certainly is a worthwhile thing to check and monitor. What is? Thanks, - C -- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Zope 2.8.4 strange behavior
We do see occasional instances where things slow down, a backlog builds, and then it clears. But no disconnects. On Sun, 27 Nov 2005, Paul Winkler wrote: On Sun, Nov 27, 2005 at 08:08:20PM -0800, Dennis Allison wrote: Thanks Paul. I checked and did not find any Client disconnected entries in the event log so this may not be the cause of what I am seeing. OK. If you were having my problem you'd see them for sure when the problem stops (for me it always fixes itself eventually). I will bump the log level to ALL (which I think includes BLATHER) Yep. when I restart things tonight. We use a remote ZEO but at the moment Zope and ZEO share the same machine. You're reasonably unlikely to have my problem then :-) -- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )