Dmitry Minsky wrote on Tue, 28 Jun 2022 14:44 +00:00: > Soo, here is an output of sql request: > > % sqlite3 rep-cache.db '.header on' 'SELECT * FROM rep_cache WHERE > revision = 7449' > hash|revision|offset|size|expanded_size > a684c1201230ed000e8baf11fcd890efebb059db|7449|3|106064003|111204465 >
OK, so it would seem r7449 added one file and no directories. That, or every other added file/directory was a copy. > > And here is 7449 file size > > % ls -l revs/7/7449 > -r--r--r--. 1 apache apache 106067461 Feb 4 2021 revs/7/7449 > So the rev file size is the sqlite3 SIZE plus 3458 bytes. I guess those could be the dir rep, node-rev header, and so on. Also: >>> '%x' % 106067461 '6527605' >>> '%x' % 106064003 '6526883' No [a-f] in either. I guess that's just a coincidence. The probability of that (disregarding the high two bytes which didn't change) is (10/16)**8) ≈ 2.3% ≈ 1/43. > > Now, what about "links" to 7449 in revisions after 7449. There is > something in 7450: > > % strings revs/7/7450 | tail -7 > copyroot: 0 / > minfo-cnt: 61 > _7.0.t7449-67p add-dir false false false > /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render > _9.0.t7449-67p add-file true true false > /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp Those are node-rev id's from r7450's transaction. It's using in-transaction id's as opposed to final in-revision id's, but I get that on a new test repo too, which suggests this is an unrelated issue and that nothing depends on these two values. Which is to say, "move along, nothing to see here". > L2P-INDEX > P2L-INDEX > 51995736 59c66b6d95365e6bdb4be4ec3b2d3a34 51995799 > 72059006b7c456b03efb7f07e0557795S > > > % strings revs/7/7450 | grep 7449 > text: 7450 3 51992815 54791207 b326aa3b7fd0ea02b8e75ac8a8dcc656 > 1430895ca8250cfb117997d6ee543e7e2c06c265 7449-67p/_b > props: 2 757 65 53 113136892f2137aa0116093a524ade0b - 7449-67p/_d That's what the structure file terms "uniquifier". I don't recall its semantics off the top of my head. > DELTA 7449 11 138 > pred: 4-7052.0.r7449/12 > pred: 3-6161.0.r7449/14 Yeah, these matter. The former is a non-self DELTA rep, i.e., a file stored as a delta against another file; the latter indicates that a node-revision ("a revision of something in the repository") is a newer revision of an existing "something in the repository" (as opposed to a historyless add). When regenerating r7449's rev file you'll want to make sure both of these pointers remain valid. The pred: links are easier since you can probably just recommit r7449 to a copy-up-to-r7448 of the repository and then change them. Make sure not to break offsets later in the file. The delta bases will require more work; see below. «svnfsfs load-index» might be helpful in regenerating the rev file. I haven't tried it. (Or you could use linear addressing for the restore, if regenerating a linear-addressing file is easier.) > DELTA 7449 15 24 > pred: 2-6160.0.r7449/16 > DELTA 7449 17 20 > pred: 1-6132.0.r7449/18 > DELTA 7449 19 25 > pred: 3-232.0.r7449/20 > DELTA 7449 21 25 > pred: 0.0.r7449/2 Ditto. > _7.0.t7449-67p add-dir false false false > /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render > _9.0.t7449-67p add-file true true false > /trunk/ProjectB/Art/Characters/Survivors/01/textures/substance_render/01_render.spp > > > And there is no "links" to 7449 in 7451 revision and after it, BUT I > still can't dump these revisions. Maybe because of "chain" of "links". > Like 7449 <- 7450 <- 7451 etc.? > That plus the fact that you didn't pass --deltas or --incremental so it tried to dump the entire contents of ^/@r7450 (what «svn co ^/@7450» would get, as opposed to «svn diff -c 7450»). > % svnadmin dump /var/repo_serpico -r7450 > ~/sdb/test.dump > svnadmin: E160004: Corrupt representation '7449 21 25 159 > 24ad3bd9d7945c1c7ca3f5e714ea868e - -' > svnadmin: E160004: Invalid r7449 footer > OK, so the next step is to reconstruct bases of the five non-self DELTAs in r7450. First, look in the truncated r7449 rev file. There might be intact reps in it. A rep always ends with "ENDREP\n". (Nothing prevents "ENDREP\n" from occurring inside the rep itself; parsing a rep requires knowing its length in advance.) Second, try the "random files" you mentioned upthread. Once you have all these candidate files — the reps extracted from the truncated rev file and the "random files" — try applying each of the deltas in r7450 to each of the candidate files, and figure out which combinations produce the md5/sha1 checksums recorded in r7450. Presumably directory deltification is enabled, meaning those five deltas comprise one file content delta (based on rep-cache.db) and four directory deltas — one for each directory level between the modified file and the repository root — which can be regenerated by hand. (This is delicate in case the svndiff — meaning the contents of the DELTA — has "copy" instructions that refer to the node-rev id inside the serialized directory node-rev, but possible.) Devs — anyone sees any simpler solution? If you've thought about this and _don't_ see a simpler solution, please say so. Assuming I haven't missed any simpler solution, you'll want: 1. To extract from the r7449 rev file what can be extracted from it. The code for that exists in libsvn_fs_fs, but you'll need to jump through hoops to arrange for it to be called even though r7449 is truncated. Basically, you need to either skip (in the debugger or with a custom patch) or fabricate (by editing rev files manually) everything that happens before libsvn_fs_fs seek()s to a particular offset in the revision file. 2. A script that takes as input a file and a delta, applies the latter to the former, and outputs the result. We don't seem to have one of those already. If you write one, do consider contributing it for our tools/ directory. 3. (possibly, depending on step #1) To regenerate the new dir reps of the truncated r7449 based on r7450 and following revisions. Daniel > % svnadmin dump /var/repo_serpico -r7451 > ~/sdb/test.dump > svnadmin: E160004: Corrupt representation '7449 21 25 159 > 8f3d18747d3388ff2b35096cafbd57ab - -' > svnadmin: E160004: Invalid r7449 footer > > > -- > Dmitry Minsky > >> On 28.06.2022, at 15:50, Daniel Shahaf <d...@daniel.shahaf.name> wrote: >> >> Dmitry Minsky wrote on Tue, 28 Jun 2022 13:18 +00:00: >>>> What does the "folder with files" contain? >>> >>> Just a random files on my computer ;) It’s not from working copy or >>> repository or anything else meaningful. Let’s assume that it’s just a >>> bunch of random files which I want to put in the middle of repo and >>> hope that it won’t blow up ;) Is that possible? >> >> With enough effort, yes. >> >> Devs: In attempting to recreate db/revs/7/7449, what needs to be >> matched? Off the top of my head, it's rep-cache.db references, actual >> rep-sharing references in future rev files, and possibly node-rev id's. >> Anything else? >> >> What's the output of «sqlite3 rep-cache.db '.header on' 'SELECT * FROM >> rep_cache WHERE revision = 7449'»? >> >> Does any rev file after 7449 contain " 7449 " on a "text:" or >> "props:" line? >> >> Does any rev file after 7449 contain ".7449/"? >> >> Daniel >> >>>> On 28.06.2022, at 15:14, Daniel Shahaf <d...@daniel.shahaf.name> wrote: >>>> >>>> Dmitry Minsky wrote on Tue, 28 Jun 2022 11:01 +00:00: >>>>> Ok. I’m pretty sure that db/revs/7/7449 is just truncated. Since there >>>>> aren’t any signs of any text readable data at the bottom of the file >>>>> and the top of file looks similar to 7448, 7450 and to any other >>>>> revision. >>>>> >>>>> So, let’s say I’m 85.23% sure about content of this particular >>>>> revision. How can I recreate revision from folder with files? This rev >>>>> contains only add-dir and add-file changes. >>>> >>>> What does the "folder with files" contain? >>>> >>>> Is it a working copy? A repository? An export? None of the above? >>>> >>>> Does it contain exactly the files and directories added in r7449 *as >>>> they were in that revision*, and nothing else?