Hi folks, I have a complex setup with six Zope 3 (3.3.1, FWIW) instances on two different machines (three instances each one), with a LVS+apache2+squid front-end; the Zope instances are running with a ZODB root database (I don't need ZEO, see below) and SQLOS (a RDBMS-object mapper based on SQLObject) and they connect to a single PostgreSQL 8.2 database server on a third machine. I use psycopg2da as database adapter. Using SQLOS and SQLObject with can avoid the use of ZEO: the objects in the ZODB database never change because they are just the "door" for the RDBMS-based objects. With this set-up we are able to serve about one million of page views per day since March.
Now, the problem is that sometimes (about two-three times per day) some of the Zope instances completely hang and I have to restart them. This often happens to one or two of the instances, but sometimes it happens to more instances. When a Zope instance hangs, it is not possible to even open the ZMI homepage: if I try to telnet on the Zope port, I can write an HTTP request but it hangs forever without sending me back the answer. I'm quite sure the problem is related to the connection to PostgreSQL, and I don't exclude a bug in psycopg2da or sqlos (I maintain both of them), but up to now I couldn't reproduce the problem in my testing environment. Also, the PostgreSQL's log file doesn't contain anything really interesting, a part of a lot of serialization errors which are handled as Retry exceptions by the zope3's publisher and are transparent for the final users. My question is easy: I don't know how to investigate the problem when my Zope instance is not responsive anymore. I tried to use PDB, GDB and friends but without success. Is there a good way to understand what happens, where is zope3 looping, in order to fix the bug (if it exists, somewhere)? Consider that this happens in the production environment, where the load is quite high, but not in my testing environment stressing the instances with ab2. Thanks in advance, -- Fabio Tranchitella http://www.kobold.it Free Software Developer and Consultant http://www.tranchitella.it _____________________________________________________________________ 1024D/7F961564, fpr 5465 6E69 E559 6466 BF3D 9F01 2BF8 EE2B 7F96 1564 _______________________________________________ Zope3-dev mailing list Zope3-dev@zope.org Unsub: http://mail.zope.org/mailman/options/zope3-dev/archive%40mail-archive.com