Dmitry Minsky wrote on Tue, 28 Jun 2022 14:44 +00:00:
> Soo, here is an output of sql request:
>
> % sqlite3 rep-cache.db '.header on' 'SELECT * FROM rep_cache WHERE 
> revision = 7449'
> hash|revision|offset|size|expanded_size
> a684c1201230ed000e8baf11fcd890efebb059db|7449|3|106064003|111204465
>

OK, so it would seem r7449 added one file and no directories.  That, or
every other added file/directory was a copy.

>
> And here is 7449 file size
>
> % ls -l revs/7/7449
> -r--r--r--. 1 apache apache 106067461 Feb  4  2021 revs/7/7449
>

So the rev file size is the sqlite3 SIZE plus 3458 bytes.  I guess those
could be the dir rep, node-rev header, and so on.

Also:

>>> '%x' % 106067461 
'6527605'
>>> '%x' % 106064003 
'6526883'

No [a-f] in either.  I guess that's just a coincidence.  The probability
of that (disregarding the high two bytes which didn't change) is
(10/16)**8) ≈ 2.3% ≈ 1/43.

>
> Now, what about "links" to 7449 in revisions after 7449. There is 
> something in 7450:
>
> % strings revs/7/7450 | tail -7
> copyroot: 0 /
> minfo-cnt: 61
> _7.0.t7449-67p add-dir false false false 
> /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render
> _9.0.t7449-67p add-file true true false 
> /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp

Those are node-rev id's from r7450's transaction.  It's using in-transaction
id's as opposed to final in-revision id's, but I get that on a new test
repo too, which suggests this is an unrelated issue and that nothing
depends on these two values.  Which is to say, "move along, nothing to
see here".

> L2P-INDEX
> P2L-INDEX
> 51995736 59c66b6d95365e6bdb4be4ec3b2d3a34 51995799 
> 72059006b7c456b03efb7f07e0557795S
>
>
> % strings revs/7/7450 | grep 7449
> text: 7450 3 51992815 54791207 b326aa3b7fd0ea02b8e75ac8a8dcc656 
> 1430895ca8250cfb117997d6ee543e7e2c06c265 7449-67p/_b
> props: 2 757 65 53 113136892f2137aa0116093a524ade0b - 7449-67p/_d

That's what the structure file terms "uniquifier".  I don't recall its
semantics off the top of my head.

> DELTA 7449 11 138
> pred: 4-7052.0.r7449/12
> pred: 3-6161.0.r7449/14

Yeah, these matter.  The former is a non-self DELTA rep, i.e., a file
stored as a delta against another file; the latter indicates that a 
node-revision
("a revision of something in the repository") is a newer revision of an
existing "something in the repository" (as opposed to a historyless
add).  When regenerating r7449's rev file you'll want to make sure both
of these pointers remain valid.

The pred: links are easier since you can probably just recommit r7449 to
a copy-up-to-r7448 of the repository and then change them.  Make sure
not to break offsets later in the file.

The delta bases will require more work; see below.

«svnfsfs load-index» might be helpful in regenerating the rev file.
I haven't tried it.  (Or you could use linear addressing for the
restore, if regenerating a linear-addressing file is easier.)

> DELTA 7449 15 24
> pred: 2-6160.0.r7449/16
> DELTA 7449 17 20
> pred: 1-6132.0.r7449/18
> DELTA 7449 19 25
> pred: 3-232.0.r7449/20
> DELTA 7449 21 25
> pred: 0.0.r7449/2

Ditto.

> _7.0.t7449-67p add-dir false false false 
> /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render
> _9.0.t7449-67p add-file true true false 
> /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp
>
>
> And there is no "links" to 7449 in 7451 revision and after it, BUT I 
> still can't dump these revisions. Maybe because of "chain" of "links". 
> Like 7449 <- 7450 <- 7451 etc.?
>

That plus the fact that you didn't pass --deltas or --incremental so it
tried to dump the entire contents of ^/@r7450 (what «svn co ^/@7450»
would get, as opposed to «svn diff -c 7450»).

> % svnadmin dump /var/repo_serpico -r7450 > ~/sdb/test.dump
> svnadmin: E160004: Corrupt representation '7449 21 25 159 
> 24ad3bd9d7945c1c7ca3f5e714ea868e - -'
> svnadmin: E160004: Invalid r7449 footer
>

OK, so the next step is to reconstruct bases of the five non-self
DELTAs in r7450.

First, look in the truncated r7449 rev file.  There might be intact reps
in it.  A rep always ends with "ENDREP\n".  (Nothing prevents "ENDREP\n"
from occurring inside the rep itself; parsing a rep requires knowing its
length in advance.)

Second, try the "random files" you mentioned upthread.

Once you have all these candidate files — the reps extracted from the
truncated rev file and the "random files" — try applying each of the
deltas in r7450 to each of the candidate files, and figure out which
combinations produce the md5/sha1 checksums recorded in r7450.

Presumably directory deltification is enabled, meaning those five deltas
comprise one file content delta (based on rep-cache.db) and four
directory deltas — one for each directory level between the modified
file and the repository root — which can be regenerated by hand.
(This is delicate in case the svndiff — meaning the contents of the
DELTA — has "copy" instructions that refer to the node-rev id inside
the serialized directory node-rev, but possible.)

Devs — anyone sees any simpler solution?  If you've thought about
this and _don't_ see a simpler solution, please say so.

Assuming I haven't missed any simpler solution, you'll want:

1. To extract from the r7449 rev file what can be extracted from it.
The code for that exists in libsvn_fs_fs, but you'll need to jump
through hoops to arrange for it to be called even though r7449 is
truncated.  Basically, you need to either skip (in the debugger or with
a custom patch) or fabricate (by editing rev files manually) everything
that happens before libsvn_fs_fs seek()s to a particular offset in the
revision file.

2. A script that takes as input a file and a delta, applies the latter
to the former, and outputs the result.  We don't seem to have one of
those already.  If you write one, do consider contributing it for our
tools/ directory.

3. (possibly, depending on step #1) To regenerate the new dir reps of
the truncated r7449 based on r7450 and following revisions.

Daniel

> % svnadmin dump /var/repo_serpico -r7451 > ~/sdb/test.dump
> svnadmin: E160004: Corrupt representation '7449 21 25 159 
> 8f3d18747d3388ff2b35096cafbd57ab - -'
> svnadmin: E160004: Invalid r7449 footer
>
>
> --
> Dmitry Minsky
>
>> On 28.06.2022, at 15:50, Daniel Shahaf <d...@daniel.shahaf.name> wrote:
>> 
>> Dmitry Minsky wrote on Tue, 28 Jun 2022 13:18 +00:00:
>>>> What does the "folder with files" contain?
>>> 
>>> Just a random files on my computer ;) It’s not from working copy or 
>>> repository or anything else meaningful. Let’s assume that it’s just a 
>>> bunch of random files which I want to put in the middle of repo and 
>>> hope that it won’t blow up ;) Is that possible?
>> 
>> With enough effort, yes.
>> 
>> Devs: In attempting to recreate db/revs/7/7449, what needs to be
>> matched? Off the top of my head, it's rep-cache.db references, actual
>> rep-sharing references in future rev files, and possibly node-rev id's.
>> Anything else?
>> 
>> What's the output of «sqlite3 rep-cache.db '.header on' 'SELECT * FROM
>> rep_cache WHERE revision = 7449'»?
>> 
>> Does any rev file after 7449 contain " 7449 " on a "text:" or
>> "props:" line?
>> 
>> Does any rev file after 7449 contain ".7449/"?
>> 
>> Daniel
>> 
>>>> On 28.06.2022, at 15:14, Daniel Shahaf <d...@daniel.shahaf.name> wrote:
>>>> 
>>>> Dmitry Minsky wrote on Tue, 28 Jun 2022 11:01 +00:00:
>>>>> Ok. I’m pretty sure that db/revs/7/7449 is just truncated. Since there 
>>>>> aren’t any signs of any text readable data at the bottom of the file 
>>>>> and the top of file looks similar to 7448, 7450 and to any other 
>>>>> revision. 
>>>>> 
>>>>> So, let’s say I’m 85.23% sure about content of this particular 
>>>>> revision. How can I recreate revision from folder with files? This rev 
>>>>> contains only add-dir and add-file changes. 
>>>> 
>>>> What does the "folder with files" contain?
>>>> 
>>>> Is it a working copy?  A repository?  An export?  None of the above?
>>>> 
>>>> Does it contain exactly the files and directories added in r7449 *as
>>>> they were in that revision*, and nothing else?

Reply via email to