Re: [Zope] Frequent ZOPE crashes
Am 25.11.09 17:37, schrieb Jaroslav Lukesh: At first, try to eliminate error outside of the Zope itself. Try to install it all into plain whole new (and reliable!) machine. Do not use restore of any backups! - Original Message - From: Andreas Krasa andreas.kr...@wu-wien.ac.at A week ago we switched to a new layout (for corporate reasons) and now we're experiencing frequent crashes of the Zope servers. Fortunately Hi Jaroslav, we're right in the process of tracking down the error outside of ZOPE. We have completely installed a new server from scratch with RHEL 5.4 and have re-installed python 2.4.6 and the latest versions of libxml2 and libxslt there. We double checked the LD config, and made sure that te correct shared objects get loaded (via lsof). We also reinstalled a few other modules that contain C-code (such as python-ldap) which we need for being able to do authenitcation. Unfortunately that didn't really help much. We still experience crashes. Are there any known issues with Zope 2.11.2, LibXML2 and/or LibXSLT that could cause these problems? The only thing we re-used is the Data.fs, which we have to, because we're talking about a production system here. Also note, that we have used excatly the same setup for a long time now, even on the same hardware, without any of these troubles. The problems only started when we switched over to a new (and probably more resource-intensive layout). We're unfortunately still not able to reproduce these crashes. Kind regards, Andreas ___ Zope maillist - Zope@zope.org https://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Frequent ZOPE crashes
Hi Tres, thank you very much for your reply! Am 29.11.09 21:57, schrieb Tres Seaver: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - Original Message - From: Andreas Krasa andreas.kr...@wu-wien.ac.at we're right in the process of tracking down the error outside of ZOPE. We have completely installed a new server from scratch with RHEL 5.4 and have re-installed python 2.4.6 and the latest versions of libxml2 and libxslt there. We double checked the LD config, and made sure that te correct shared objects get loaded (via lsof). We also reinstalled a few other modules that contain C-code (such as python-ldap) which we need for being able to do authenitcation. Unfortunately that didn't really help much. We still experience crashes. Are there any known issues with Zope 2.11.2, LibXML2 and/or LibXSLT that could cause these problems? The only thing we re-used is the Data.fs, which we have to, because we're talking about a production system here. Also note, that we have used excatly the same setup for a long time now, even on the same hardware, without any of these troubles. The problems only started when we switched over to a new (and probably more resource-intensive layout). We're unfortunately still not able to reproduce these crashes. Can you set 'ulimit -c' to get a core file, which might at least help point to the extension which is to blame (although it may just show the downstream victim of a heap munge). What versions of libxml2 / libxslt are you using? How about lxml? Yes, we did set the ulimit and were indeed able to produce a coredump for each crash happening (each having something between 300 and 700 MB). We tried to debug using gdb but unfortunaley they only reveal two cases when the crashes occur: 1) During garbage collection where the gc tries to clean up damaged python objects 2) During some ceval process, also related to accessing damaged python objects Unfortunately it doesn't reveal what exactly trashes the objects. To us it seems that this could happen some time earlier before either of the two processes mentioned above tries to access the objects and crashes ZOPE. For now, we don't really see a reproduceable pattern as it seems to be a somewhat more complex user behavior which leads to this. We were able to extract a few URLs out of the coredumps but directly accessing those does nothing. Also the last logged access in the Z2.log before the coredump triggers nothing, when directly accessing it. We're running ZOPE-2.11.2 with an eggified version of ZODB3-3.8.4 plus libxml2-2.7.6, libxslt-1.1.26 and lxml-2.2.4 now, the crashes still happen. Previously we've been running with ZOPE-2.11.2, libxml2-2.7.3, libxslt-1.1.24 and lxml-2.1.5. That also crashed ZOPE occasionally. This only happened since we switched to a new layout (probably in combination with a few minor Silva updates). We have been using the same system software (RHEL5), hardware, python version and libxml2/libxslt/lxml versions with our old old layout, where everything worked fine for years. I would be happy to paste any particular gdb outputs if that is of any help...? Kind regards, Andreas ___ Zope maillist - Zope@zope.org https://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope-dev )
[Zope] Frequent ZOPE crashes
Hello Mailinglist, we've been using ZOPE in combination with the Silva CMS for around four years now to serve our University's homepage. And everything worked fine so far. A week ago we switched to a new layout (for corporate reasons) and now we're experiencing frequent crashes of the Zope servers. Fortunately enough the reconnect themselves to the ZODB but since this is now happening around every five minutes, I'm rather worried that this might permanently damage the ZODB. I have absolutely no idea how this can happen, as we're using the same python, libxml2, libxslt and other module versions as with the old homepage - in fact the new site even runs on the same hardware. We never experienced any problems like these up until now. As far as I understood so far, it requires some C modules to successfully cause ZOPE to segfault? Versions we're using: Python 2.4.6 Zope 2.11.2 LibXML2 2.7.3 LibXSLT 1.1.24 Python-LDAP 2.3.6 Setuptools 0.6c9 and a Kerberos Module plus the Silva CMS (2.1) on top. We have four ZOPE servers, each running two ZEO processes and a separate ZODB. The machines all run RedHat Enterprise Linux 5.4. In front of that Apache, Squid and Pound take care of the caching. What we did was to examine the coredump-files with gdb but unfortunately this didn't prove to be very helpful because either things go wrong during garbage collection or some ceval stuff. So basically something trashes certain python-objects at time before. Do you have *any* hinst in how to track down this problem? Or are there any known problems with the versions above? The changelogs didn't reveal any plausible cause for me... Kind regards, Andreas Krasa ___ Zope maillist - Zope@zope.org https://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - https://mail.zope.org/mailman/listinfo/zope-announce https://mail.zope.org/mailman/listinfo/zope-dev )
[Zope] Zope.org = Zope.com?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello everybody, it seems that currently all HTTP requests to www.zope.org and dev.zope.org are forwarded to www.zope.com. I find it hard to believe that this is a desired behavior... But if yes, where can Zope itself and Zope products be downloaded? The Zope.com site contains tons of nice eye-candy and marketing-yadayada but no resources for developers whatsoever. Regards, Andreas Krasa :: Andreas Krasa WU ZID Information Center :: fon: +43/1/31336/6996 Augasse 2-6 1090 Wien .at :: fax: +43/1/31336/789 :: icq: 2059600 :: pgp key-id: 0xDA178BDC -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2.1 (MingW32) iD8DBQFEMO42fmH5mdoXi9wRAmORAJ0QzixhttxVnX/3N0er70AgWMB15wCcDyHd ufTmW/NYj8N6WIUCkyeZBYo= =DdsT -END PGP SIGNATURE- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZEO troubles on RedHat EL4 Linux
Dieter Maurer schrieb: Andreas Krasa // WUW wrote at 2005-8-16 18:37 +0200: ... == ERROR: checkMultipleAddresses (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes I have seen similar errors happening non deterministically in the presence of a SIGCHLD handler set to SIG_IGN. Such a handler causes the operating system to reap away so called zombie processes and if the zombie no longer exists, waitpid will fail. Some *nix variants automatically pass the SIG_IGN down to child processes. Our Debian and SuSE Linux versions do. I had to change Zope.Startup.run not to use SIG_IGN as SIGCHLD handler in order to avoid such problems. In case, you run your tests with zopectl test, you may see this problem... Hi Dieter! Thanks very much for your help! I will give this one a try! Btw. since this also happens on 5 other machines - all natively installed with RHEL4 - there actually might really be something wrong within the OS. Is that worth submitting a bug to RedHat? Or is ist more like a feature? ;) Thanks again, Andreas ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZEO troubles on RedHat EL4 Linux
Jens Vagelpohl schrieb: On 18 Aug 2005, at 07:50, Andreas Krasa // WUW wrote: Is that worth submitting a bug to RedHat? Or is ist more like a feature? ;) Why would RedHat care? They will just throw it back at you and say sorry, Zope is not one of our supported packages. By the way, I hope you are not running Zope on the system-installed Python? If you do, then change your setups to build and install your own Python just for Zope and test again. jens Hi Jens, no, we've rebuilt python (2.3.5) from sources, and, as our main Zope product Silva requires this, also libxml2 and libxslt (of course with pointing to our own python). This stuff all resides in /usr/local. We've compiled Zope pointing to /usr/local/bin/python23, so I guess that RedHat's own python RPM does not interfere with Zope, at least I hope so. As I understood Dieter's mail, this strange behavior is caused by the way RedHat Enterprise Linux 4 system libraries handle SIG_IGN/SIGCHLD. If this problem was due to some improper Zope methods, most people would have this sort of problems. Which is not the case. That makes me believe that the failure of ZEO tests actually is caused by some uncommon or improper implementation of those two handles - which, in my opinion, makes it something RedHat should take a look at. Anyway - how severe are those testing failures for actually USING a ZEO client/server on that particular OS as a production system? Cheers, Andreas ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
[Zope] ZEO troubles on RedHat EL4 Linux
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello everybody! We are encountering some really strange problems with Zope 2.7.7 on our RedHat EL 4 Linux machines. During the Zope 2.7.7 compilation works - however most of the time make test returns a random number of errors (somewhere between 20 and 30) ALL related to ZEO. The funny thing is, we've managed to do a make test without any failures - however after doing a make distclean and compiling everything again make test produces the above mentioned errors (using *exactly* the same source code!). I have absolutely no idea how this can happen - ANY hints are appreciated! Is this a known issue? What could it be related to? Thanks a lot! Regards, Andreas Krasa -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (MingW32) iD8DBQFDAg7pfmH5mdoXi9wRAqkKAJ9oBzDN8WUzYYeNACVPJM0ifP4cwgCdFQh6 LPV9D5RElHRSbr256xj+HVY= =qzGm -END PGP SIGNATURE- ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] ZEO troubles on RedHat EL4 Linux
Jens Vagelpohl schrieb: During the Zope 2.7.7 compilation works - however most of the time make test returns a random number of errors (somewhere between 20 and 30) ALL related to ZEO. Maybe someone can help if you actually *tell us* what these errors are. At least my own crystal ball is in the shop for repairs right now... :) jens Hi! Oops, almost forgot about those - the errors are as follows. They are always related to ZEO and an OSError No child processes. Thanks best regards, Andreas Krasa --- == ERROR: checkMultipleAddresses (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkMultipleServers (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkReadOnlyClient (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkReadOnlyFallbackReadOnlyServer (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkReadOnlyFallbackWritable (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkReconnectWritable (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkReconnection (ZEO.tests.testConnection.MappingStorageConnectionTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkTimeout (ZEO.tests.testConnection.MappingStorageTimeoutTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkTimeoutAfterVote (ZEO.tests.testConnection.MappingStorageTimeoutTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR: checkTimeoutOnAbortNoLock (ZEO.tests.testConnection.MappingStorageTimeoutTests) -- Traceback (most recent call last): File /usr/local/src/__zope__/Zope-2.7.7-final/lib/python/ZEO/tests/ConnectionTests.py, line 121, in tearDown os.waitpid(pid, 0) OSError: [Errno 10] No child processes == ERROR
Re: [Zope] ZEO troubles on RedHat EL4 Linux
Tim Peters schrieb: [Andreas Krasa] We are encountering some really strange problems with Zope 2.7.7 on our RedHat EL 4 Linux machines. During the Zope 2.7.7 compilation works - however most of the time make test returns a random number of errors (somewhere between 20 and 30) ALL related to ZEO. The funny thing is, we've managed to do a make test without any failures - however after doing a make distclean and compiling everything again make test produces the above mentioned errors (using *exactly* the same source code!). I have absolutely no idea how this can happen - ANY hints are appreciated! Is this a known issue? No. For example, it doesn't happen in the daily overnight testrunner reports. What could it be related to? ZEO wink? You'll have to give more info about which tests fail, and precisely how they fail. Because many of the ZEO tests create multiple processes, and try to assign sockets so that these processes can communicate, they're vulnerable to vagaries of OS process scheduling and socket use by other apps. For example, on a slow or overburdened (with other simultaneous work) machine, some ZEO tests can fail due to not getting enough cycles soon enough. The worst tests of that sort wait as long as a minute now for another process to do something they're waiting for before failing, but not even waiting a minute can _guarantee_ success. Might be informative to run the tests on an otherwise-quiet machine. Thank you Tim for the feedback! Our system is a Intel Xeon 3 GHz Dual-CPU with 2.5 GB RAM running RedHat Enterprise Linux 4 (SElinux disabled). As this is a test-machine it doesn't run any CPU-consuming tasks I can think of - the server load is usually somewhere between 0.00 and 0.10. But I'll check that nevertheless! Best regards Andreas ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )