[zenoss-users] corrupt db / zope? maint not reverting, event not archiving

dusty Sun, 31 May 2009 17:55:03 -0700

I have an issue where maintenance windows are sometimes not bringing systems 
back to their original mode when complete.  This is even more difficult because 
its not every time, just every few days, seemingly randomly.


Here are some things I've seen in my logs
2009-05-18 04:00:03 INFO zen.MaintenanceWindows: MW daily changes xxx's 
production state from 1000 to 300
2009-05-18 04:00:03 INFO zen.MaintenanceWindows: MW daily changes xxx's 
production state from 1000 to 300
2009-05-18 04:00:03 INFO zen.MaintenanceWindows: MW daily changes xxx's 
production state from 1000 to 300
2009-05-18 04:00:05 CRITICAL txn.2475712: A storage error occured during the 
second phase of the two-phase commit.  Resources may be in an inconsistent 
state.

Then, at the end that one, I see a bunch of errors like this:

2009-05-18 04:59:59 ERROR ZODB.Connection: Couldn't load state for 0x89fc
Traceback (most recent call last):
  File "/usr/local/zenoss/zenoss/lib/python/ZODB/Connection.py", line 704, in 
setstate
    self._setstate(obj)
  File "/usr/local/zenoss/zenoss/lib/python/ZODB/Connection.py", line 758, in 
_setstate
    self._reader.setGhostState(obj, p)
  File "/usr/local/zenoss/zenoss/lib/python/ZODB/serialize.py", line 495, in 
setGhostState
    state = self.getState(pickle)
  File "/usr/local/zenoss/zenoss/lib/python/ZODB/serialize.py", line 488, in 
getState
    return unpickler.load()
  File "/usr/local/zenoss/zenoss/lib/python/ZODB/serialize.py", line 436, in 
_persistent_load
    return self._conn.get(oid)
  File "/usr/local/zenoss/zenoss/lib/python/ZODB/Connection.py", line 207, in 
get
    p, serial = self._storage.load(oid, self._version)
  File "/data/zenoss/zenoss/lib/python/ZEO/ClientStorage.py", line 746, in load
    return self.loadEx(oid, version)[:2]
  File "/data/zenoss/zenoss/lib/python/ZEO/ClientStorage.py", line 774, in 
loadEx
    self._cache.store(oid, ver, tid, None, data)
  File "/data/zenoss/zenoss/lib/python/ZEO/cache.py", line 293, in store
    self.fc.add(o)
  File "/data/zenoss/zenoss/lib/python/ZEO/cache.py", line 980, in add
    available = self._makeroom(size)
  File "/data/zenoss/zenoss/lib/python/ZEO/cache.py", line 915, in _makeroom
    size, e = self.filemap.pop(ofs)
KeyError: 349681

I manually put them all back to production mode and it did seem to report ok 
for the next couple of days.  

Then on a recent window, it happened again.  I saw errors like below, that were 
triggered exactly when the window was over, so I'm assuming this has to do with 
trying to revert the machine back to original mode.  I didn't see anything 
about the two-phase commit problem on this series, but I did see these errors:

2009-05-29 04:59:59 ERROR zen.Events: (1062, "Duplicate entry 
'f0ea1333-6198-495b-a8a5-c808124710d4' for key 1")
Traceback (most recent call last):
  File "/data/zenoss/zenoss/Products/ZenEvents/MySqlSendEvent.py", line 46, in 
execute
    result = cursor.execute(statement)
  File 
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/cursors.py", line 
137, in execute
    self.errorhandler(self, exc, value)
  File 
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/connections.py", 
line 33, in defaulterrorhandler
    raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry 'f0ea1333-6198-495b-a8a5-c808124710d4' 
for key 1")
2009-05-29 05:00:02 INFO zen.ZenActions: Processed 0 commands in 0.000037
2009-05-29 05:00:02 INFO zen.ZenActions: processed 2 rules in 0.01 secs
2009-05-29 05:00:02 ERROR zen.Events: (1062, "Duplicate entry 
'f0ea1333-6198-495b-a8a5-c808124710d4' for key 1")
Traceback (most recent call last):
  File "/data/zenoss/zenoss/Products/ZenEvents/MySqlSendEvent.py", line 46, in 
execute
    result = cursor.execute(statement)
  File 
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/cursors.py", line 
137, in execute
    self.errorhandler(self, exc, value)
  File 
"/usr/local/zenoss/python/lib/python2.4/site-packages/MySQLdb/connections.py", 
line 33, in defaulterrorhandler
    raise errorclass, errorvalue
IntegrityError: (1062, "Duplicate entry 'f0ea1333-6198-495b-a8a5-c808124710d4' 
for key 1")

In addition, I've had problems archiving some events.  Using firebug to see the 
ajax requests, it shows me a 500 error referring to a duplicate key problem.  
So, for that, I've been going in and deleting the events from the database 
directly.

Is there a safe way I can get my system back into a consistent state without 
losing any data?  Or, at least without losing the configuration.  I suppose I 
could get by without the history if its absolutely needed.  I'm just not sure 
where to start since I'm seeing both errors in sql and in zopedb.

I am running 2.4.1 on Fedora Core 10, running on Vmware ESXi.  I recently 
upgraded from 2.3.x.  I saw this problem before the upgrade and was hoping that 
the upgraded would fix it, unfortunately, it didn't.

Any suggestions or advice?

Thanks
Dusty Doris




-------------------- m2f --------------------

Read this topic online here:
http://forums.zenoss.com/viewtopic.php?p=35451#35451

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

[zenoss-users] corrupt db / zope? maint not reverting, event not archiving

Reply via email to