[ 
https://issues.apache.org/jira/browse/OAK-8170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801691#comment-16801691
 ] 

Amit Jain edited comment on OAK-8170 at 3/26/19 1:27 PM:
---------------------------------------------------------

[~wim.symons] That's how its supposed to work. The revision garbage collection 
or segment store compaction (online or offline) are a prerequisite to running 
the DataStore maintenance operations like DSGC, consistency check. This is also 
mentioned in the Blob Garbge Collection section of the doc [1]
{quote}Mark NodeStore - GC logic would make a record of all the blob references 
which are referred by any node present in NodeStore. Note that any blob 
references from old revisions of node would also be considered as a valid 
references.
{quote}
What's causing confusion is the usage of --verbose operation. This option is 
generally meant to be used when there's need to identify nodes referencing 
blobs, specifically in the case when there are blobs needed that have been 
deleted for some reason and need to be restored. Since, it iterates over nodes 
only reachable from the head state and that is why it is able to not report as 
yet uncollected nodes.

I definitely think there's room for improvement in the documentation to clear 
this confusion. Could you please create a doc issue where any lacuna can be 
addressed.

 

[1] [https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html]


was (Author: amitjain):
[~wim.symons] That's how its supposed to work. The revision garbage collection 
or segment store compaction (online or offline) are a prerequisite to running 
the DataStore maintenance operations like DSGC, consistency check. This is also 
mentioned in the Blob Garbge Collection section of the doc [1]
{quote}Mark NodeStore - GC logic would make a record of all the blob references 
which are referred by any node present in NodeStore. Note that any blob 
references from old revisions of node would also be considered as a valid 
references.
{quote}
What's causing confusion is the usage of --verbose operation. This option is 
generally meant to be used when there's need to identify nodes referencing 
blobs, specifically in the case when there are blobs needed that have been 
deleted for some reason and need to be restored. Since, it iterates over nodes 
only reachable from the head state and that is why it is able to not report as 
yet uncollected nodes.

I definitely think there's for improvement in the documentation to clear this 
confusion. Could you please create a doc issue where any lacuna can be 
addressed.

 

[1] [https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html]

> oak-run datastorecheck and online consistency check falsely report missing 
> blobs
> --------------------------------------------------------------------------------
>
>                 Key: OAK-8170
>                 URL: https://issues.apache.org/jira/browse/OAK-8170
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>    Affects Versions: 1.8.9
>            Reporter: Wim Symons
>            Priority: Major
>         Attachments: output.txt
>
>
> Hi,
> We found that oak-run datastorecheck falsely reports missing blobs when 
> running datastorecheck without the --verbose option.
> Even the online datastore consistency check falsely reports the same missing 
> blobs.
> This is related due to the fact that the standard blob reference collector in 
> oak-run datastorecheck looks at *all* compaction generations in the segment 
> store instead of only the last one.
> After running an offline compaction, and thus keeping only 1 generation, the 
> correct number of blob references and missing blobs is reported by oak-run 
> datastorecheck.
> The bug on the 1.8 branch comes from 
> org.apache.jackrabbit.oak.plugins.blob.BlobReferenceRetriever#collectReferences
>  (line 429) and by following that you arrive at 
> org.apache.jackrabbit.oak.segment.file.FileStore#tarFiles (line 1013) stating:
> tarFiles.collectBlobReferences(collector,
>  newOldReclaimer(lastCompactionType, getGcGeneration(), 
> gcOptions.getRetainedGenerations()));
> I'm not familiar enough with this source code, so I won't attempt adding a 
> patch.
> I did double-check trunk and saw the same line of code there: 
> org.apache.jackrabbit.oak.segment.file.GarbageCollector#collectBlobReferences 
> (line 324).
> I attached a text file with the outputs of the commands I ran.
> We currently use Oak 1.8.9 using AEM 6.4.3.0 and oak-blob-cloud 1.8.9 from 
> the 1.8.3 AEM S3 connector.
> Regards
> Wim



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to