Re: [ZODB-Dev] ZEO Client deadlocking in asyncore.poll - how to I debug
check out zeo server log files. a known problem is people using iptables or some sort of filtering between ZEO clients and ZEO server. this config took several hours off my life ;-( On Mon, Apr 7, 2008 at 9:16 AM, Anton Stonor [EMAIL PROTECTED] wrote: We have a setup with a ZEO server and 4 ZEO clients. During the last weeks we have seen almost daily deadlocks in some of the ZEO clients. I've tried to wait for up to 30 minutes before restarting a client. I could need an advice on how to debug this. With DeadlockDebugger I see the same pattern each time: One thread is hanging: File /usr/local/www/zope-2.9.6/lib/python/ZODB/Connection.py, line 732, in setstate self._setstate(obj) File /usr/local/www/zope-2.9.6/lib/python/ZODB/Connection.py, line 768, in _setstate p, serial = self._storage.load(obj._p_oid, self._version) File /usr/local/www/zope-2.9.6/lib/python/ZEO/ClientStorage.py, line 746, in load return self.loadEx(oid, version)[:2] File /usr/local/www/zope-2.9.6/lib/python/ZEO/ClientStorage.py, line 769, in loadEx data, tid, ver = self._server.loadEx(oid, version) File /usr/local/www/zope-2.9.6/lib/python/ZEO/ServerStub.py, line 192, in loadEx return self.rpc.call(loadEx, oid, version) File /usr/local/www/zope-2.9.6/lib/python/ZEO/zrpc/connection.py, line 531, in call r_flags, r_args = self.wait(msgid) File /usr/local/www/zope-2.9.6/lib/python/ZEO/zrpc/connection.py, line 638, in wait asyncore.poll(delay, self._singleton) File /usr/local/lib/python2.4/asyncore.py, line 122, in poll r, w, e = select.select(r, w, e, timeout) The other threads of the ZEO client are waiting for the hanging thread to release the storage lock so that they can acquire it: File /usr/local/www/zope-2.9.6/lib/python/ZEO/ClientStorage.py, line 760, in loadEx self._load_lock.acquire() When I connect to the ZEO server monitor I can see an increasing number of reads (probably from the other ZEO Clients). I've set transaction-timeout 15. How to I dig further to resolve this? zeo.conf partly below: -- zeo address 8200 read-only false invalidation-queue-size 100 # pid-filename $INSTANCE/var/ZEO.pid monitor-address 8201 transaction-timeout 15 /zeo filestorage 1 path $INSTANCE/var/Data.fs /filestorage %import tempstorage temporarystorage temp name temporary storage for sessioning /temporarystorage -- Anton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev -- Alan Runyan Enfold Systems, Inc. http://www.enfoldsystems.com/ phone: +1.713.942.2377x111 fax: +1.832.201.8856 ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
Re: [ZODB-Dev] ZEO Client deadlocking in asyncore.poll - how to I debug
Check that your ZEO client cache size is big enough. If your code is making queries that return more objects than the cache can hold it will result in a state where the client needs to constantly load objects from storage server. If you switch on debugging on the ZEO server you should see what objects are being loaded. -- Roché Compaan Upfront Systems http://www.upfrontsystems.co.za On Mon, 2008-04-07 at 16:16 +0200, Anton Stonor wrote: We have a setup with a ZEO server and 4 ZEO clients. During the last weeks we have seen almost daily deadlocks in some of the ZEO clients. I've tried to wait for up to 30 minutes before restarting a client. I could need an advice on how to debug this. With DeadlockDebugger I see the same pattern each time: One thread is hanging: File /usr/local/www/zope-2.9.6/lib/python/ZODB/Connection.py, line 732, in setstate self._setstate(obj) File /usr/local/www/zope-2.9.6/lib/python/ZODB/Connection.py, line 768, in _setstate p, serial = self._storage.load(obj._p_oid, self._version) File /usr/local/www/zope-2.9.6/lib/python/ZEO/ClientStorage.py, line 746, in load return self.loadEx(oid, version)[:2] File /usr/local/www/zope-2.9.6/lib/python/ZEO/ClientStorage.py, line 769, in loadEx data, tid, ver = self._server.loadEx(oid, version) File /usr/local/www/zope-2.9.6/lib/python/ZEO/ServerStub.py, line 192, in loadEx return self.rpc.call(loadEx, oid, version) File /usr/local/www/zope-2.9.6/lib/python/ZEO/zrpc/connection.py, line 531, in call r_flags, r_args = self.wait(msgid) File /usr/local/www/zope-2.9.6/lib/python/ZEO/zrpc/connection.py, line 638, in wait asyncore.poll(delay, self._singleton) File /usr/local/lib/python2.4/asyncore.py, line 122, in poll r, w, e = select.select(r, w, e, timeout) The other threads of the ZEO client are waiting for the hanging thread to release the storage lock so that they can acquire it: File /usr/local/www/zope-2.9.6/lib/python/ZEO/ClientStorage.py, line 760, in loadEx self._load_lock.acquire() When I connect to the ZEO server monitor I can see an increasing number of reads (probably from the other ZEO Clients). I've set transaction-timeout 15. How to I dig further to resolve this? zeo.conf partly below: -- zeo address 8200 read-only false invalidation-queue-size 100 # pid-filename $INSTANCE/var/ZEO.pid monitor-address 8201 transaction-timeout 15 /zeo filestorage 1 path $INSTANCE/var/Data.fs /filestorage %import tempstorage temporarystorage temp name temporary storage for sessioning /temporarystorage -- Anton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev
[ZODB-Dev] Re: ZEO Client deadlocking in asyncore.poll - how to I debug
Thanks for you suggestions, Alan, Roché and Dieter, I'll switch the zeo server logging to debug level even though the amount of data is scary -- and try to find a way to reduce the load on the ZEO server (Roché). I think you (Alan and Dieter) might be right that there could be a network issue that gets triggered during high load. We don't have any apparent package filtering rules. Maybe having a closer look with tcpdump/wireshark could reveal something. I'll keep you posted. While we are working on getting to the root of this, isn't there a way to set a timeout on the client side, so it wont wait forever for a response that are lost in the mail? Thanks again, Anton ___ For more information about ZODB, see the ZODB Wiki: http://www.zope.org/Wikis/ZODB/ ZODB-Dev mailing list - ZODB-Dev@zope.org http://mail.zope.org/mailman/listinfo/zodb-dev