Re: [ZODB-Dev] Recovering from BTree corruption

2007-09-27 Thread Jim Fulton


On Sep 12, 2007, at 10:28 AM, Jim Fulton wrote:
...

  - checkbtrees.py
  - fstest.py


There's an fsrefs script that checks internal references I believe.


fsrefs.py shows loads of problems in both the data.fs and the  
resources.fs.

probably  200 entries per database. i.e.

oid 0xD87110L BTrees._OOBTree.OOBucket
last updated: 2007-09-04 14:43:37.687332, tid=0x37020D3A0CC9DCCL
refers to invalid objects:
oid ('\x00\x00\x00\x00\x00\xb0+f', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbc', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xb0N\xbd', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xd7\xb1\xa0', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc5\xe8:', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6l', None) missing:  
'unknown'
oid ('\x00\x00\x00\x00\x00\xc3\xc6m', None) missing:  
'unknown'

oid ('\x00\x00\x00\x00\x00\xcahC', None) missing: 'unknown'
oid ('\x00\x00\x00\x00\x00\xaf\x07\xc1', None) missing:  
'unknown'

...


  - How do I tell if something is a reference to another database?


I don't know how to do this with fsrefs.  I'm not 100% sure that  
fsrefs recognizes cross-database references.


I did a little looking at fsrefs.  It doesn't analyze the types of  
references. It just tries to load objects.  This approach, aside from  
being less informative than it should be, totally fails with multiple  
databases. Cross-database references will always be reported as  
missing by fsrefs.





I'll try to make some time in the next few days to look at this issue.


Man it's hard to make time ...



I'll look at fsrefs a bit more closely to:

  - make sure it understands cross-database references, and


It doesn't.

  - Make sure it reports whether missing references are local or  
remote.


Haha ;)

I'd like to decide what to do next based on this investigation.  In  
particular, I want to be sure if the problems you are having are  
actually due to cross-database reference issues.


I'll also look at writing a tool that might be able to recover lost  
objects from backup databases.  The idea is that a tool would scan  
a database for missing oids save the list to files, separating  
references to different databases.  Then there'd be another tool  
that would read this list and a list of old database files and scan  
the files looking for oids in the list and extracting records if  
they are found.


I spent some time on an analyses tool. See:

  http://svn.zope.org/zc.fsutil/branches/dev/

and especially:

  http://svn.zope.org/zc.fsutil/branches/dev/src/zc/fsutil/ 
references.txt?view=auto


It will help you figure out if you have holes and separate cross- 
database and local references.  You may have to work a little though.  
The data structures produced will allow you to analyze broken cross- 
database references in a way that should be fairly obvious. (Hint,  
you'll have to generate data for each database and make sure that all  
of oids mentioned in the set of cross-database references are  
actually present in the named databases.)


A major challenge is handling large databases.  We have databases  
will millions of objects and I kept having to trim the amount of data  
analyzed to fit the data structures in memory.  It is interesting to  
look at the evolution of the data structures over the last couple of  
days yesterday as I tried to cope with scale.


The obvious next step is to store data in a database rather than  
memory.  This will slow things down, but will allow me to work with  
arbitrarily large databases and keep richer data structures.


Assuming that you still care about this (you've been quiet :), I  
suggest using this tool to find the holes. (You can also use it to  
find the objects that refer to the missing objects.)


Then, once you've found the missing oids, you should go to backups,  
open file storages on the backups and, if the oids are present, copy  
the pickles to the database under repair.  Something like:


  pickles = [backup_storage.load(oid, '')[0] for oid in oids]
  t = transaction.begin()
  s = database_with_hole
  s.tpc_begin(t)
  [s.store(oid, '\0'*8, p, '', t) for (oid, p) in zip(oids, pickles)]
  s.tpc_vote(t)
  s.tpc_finish(t)

If you don't have the data in backups, then you might be able to use  
information about the objects referring to the missing objects to  
repair the refering objects by hand by deleting the references to  
missing objects.


Hope this helps.

Jim

--
Jim Fulton
Zope Corporation


___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev


Re: [ZODB-Dev] funky _p_mtime values

2007-09-27 Thread Dieter Maurer
Thomas Clement Mogensen wrote at 2007-9-27 12:43 +0200:
 ...
Within the last few days something very strange has happened: All  
newly created or modified objects get a _p_mtime that is clearly  
incorrect and too big for DataTime to consider it a valid timestamp.  
(ie. int(obj._p_mtime) returns a long).

Values I get for _p_mtime on these newly altered objects are  
something like:
8078347503.108635
with only the last few decimals differing among all affected objects.
Objects changed at the same time appear to get the same stamp.

Looks interesting

When I see such unexplainable things, I tend to speak of alpha rays
Computers are quite reliable -- but not completely. Every now
and then a bit is changing which should not change.
In my current life, I have seen things like this 3 times -- usually
in the form that the content of a file changed without that
any application touched it.

When you accept such a wieldy explanation, then, maybe, I can
give one:

  FileStorage must ensure that all transaction ids (they are
  essentially timestamps) are strictly increasing.

  To this end, it maintains a current transaction id.
  When a new transaction id is needed, it tries to construct
  one from the current time. But if this is smaller than
  the current transaction id, then it increments that a little
  and uses it as new transaction id.

  Thus, if for some reasons, once a bit changed in the 
  current transaction id (or in the file that maintains it
  persistently), then you may no longer get away with it.

On Plone.org, someone asked today how to fix the
effects on the ZODB of an administrator changing the system time to 2008.
If he finds a solution, then your problem may be tackled the same way.



-- 
Dieter
___
For more information about ZODB, see the ZODB Wiki:
http://www.zope.org/Wikis/ZODB/

ZODB-Dev mailing list  -  ZODB-Dev@zope.org
http://mail.zope.org/mailman/listinfo/zodb-dev