I have an issue where maintenance windows are sometimes not bringing systems
back to their original mode when complete. This is even more difficult because
its not every time, just every few days, seemingly randomly.
Here are some things I've seen in my logs
2009-05-18 04:00:03 INFO zen.MaintenanceWindows: MW daily changes xxx's
production state from 1000 to 300
2009-05-18 04:00:03 INFO zen.MaintenanceWindows: MW daily changes xxx's
production state from 1000 to 300
2009-05-18 04:00:03 INFO zen.MaintenanceWindows: MW daily changes xxx's
production state from 1000 to 300
2009-05-18 04:00:05 CRITICAL txn.2475712: A storage error occured during the
second phase of the two-phase commit. Resources may be in an inconsistent
state.
Then, at the end that one, I see a bunch of errors like this:
2009-05-18 04:59:59 ERROR ZODB.Connection: Couldn't load state for 0x89fc
Traceback (most recent call last):
File "/usr/local/zenoss/zenoss/lib/python/ZODB/Connection.py", line 704, in
setstate
self._setstate(obj)
File "/usr/local/zenoss/zenoss/lib/python/ZODB/Connection.py", line 758, in
_setstate
self._reader.setGhostState(obj, p)
File "/usr/local/zenoss/zenoss/lib/python/ZODB/serialize.py", line 495, in
setGhostState
state = self.getState(pickle)
File "/usr/local/zenoss/zenoss/lib/python/ZODB/serialize.py", line 488, in
getState
return unpickler.load()
File "/usr/local/zenoss/zenoss/lib/python/ZODB/serialize.py", line 436, in
_persistent_load
return self._conn.get(oid)
File "/usr/local/zenoss/zenoss/lib/python/ZODB/Connection.py", line 207, in
get
p, serial = self._storage.load(oid, self._version)
File "/data/zenoss/zenoss/lib/python/ZEO/ClientStorage.py", line 746, in load
return self.loadEx(oid, version)[:2]
File "/data/zenoss/zenoss/lib/python/ZEO/ClientStorage.py", line 774, in
loadEx
self._cache.store(oid, ver, tid, None, data)
File "/data/zenoss/zenoss/lib/python/ZEO/cache.py", line 293, in store
self.fc.add(o)
File "/data/zenoss/zenoss/lib/python/ZEO/cache.py", line 980, in add
available = self._makeroom(size)
File "/data/zenoss/zenoss/lib/python/ZEO/cache.py", line 915, in _makeroom
size, e = self.filemap.pop(ofs)
KeyError: 349681
I manually put them all back to production mode and it did seem to report ok
for the next couple of days.
Then on a recent window, it happened again. I saw errors like below, that were
triggered exactly when the window was over, so I'm assuming this has to do with
trying to revert the machine back to original mode. I didn't see anything
about the two-phase commit problem on this series, but I did see these errors:
2009-05-29 04:59:59 ERROR zen.Events: (1062, "Duplicate entry
'f0ea1333-6198-495b-a8a5-c808124710d4' for key 1")
Traceback (most recent call last):
File "/data/zenoss/zenoss/Products/ZenEvents/MySqlSendEvent.py", line 46, in
execute
result = cursor.execute(statement)
File
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/cursors.py", line
137, in execute
self.errorhandler(self, exc, value)
File
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/connections.py",
line 33, in defaulterrorhandler
raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry 'f0ea1333-6198-495b-a8a5-c808124710d4'
for key 1")
2009-05-29 05:00:02 INFO zen.ZenActions: Processed 0 commands in 0.000037
2009-05-29 05:00:02 INFO zen.ZenActions: processed 2 rules in 0.01 secs
2009-05-29 05:00:02 ERROR zen.Events: (1062, "Duplicate entry
'f0ea1333-6198-495b-a8a5-c808124710d4' for key 1")
Traceback (most recent call last):
File "/data/zenoss/zenoss/Products/ZenEvents/MySqlSendEvent.py", line 46, in
execute
result = cursor.execute(statement)
File
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/cursors.py", line
137, in execute
self.errorhandler(self, exc, value)
File
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/connections.py",
line 33, in defaulterrorhandler
raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry 'f0ea1333-6198-495b-a8a5-c808124710d4'
for key 1")
In addition, I've had problems archiving some events. Using firebug to see the
ajax requests, it shows me a 500 error referring to a duplicate key problem.
So, for that, I've been going in and deleting the events from the database
directly.
Is there a safe way I can get my system back into a consistent state without
losing any data? Or, at least without losing the configuration. I suppose I
could get by without the history if its absolutely needed. I'm just not sure
where to start since I'm seeing both errors in sql and in zopedb.
I am running 2.4.1 on Fedora Core 10, running on Vmware ESXi. I recently
upgraded from 2.3.x. I saw this problem before the upgrade and was hoping that
the upgraded would fix it, unfortunately, it didn't.
Any suggestions or advice?
Thanks
Dusty Doris
-------------------- m2f --------------------
Read this topic online here:
http://forums.zenoss.com/viewtopic.php?p=35451#35451
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users