thomasmueller commented on code in PR #771: URL: https://github.com/apache/jackrabbit-oak/pull/771#discussion_r1035637354
########## oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java: ########## @@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE // Mark all used references iterateNodeTree(fs, false); - + + // Get size + sizeBlobStoreReferences(fs, stats); + // Move the marked references file to the data store meta area if applicable GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix); LOG.debug("Ending mark phase of the garbage collector"); } + private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats) + throws IOException { + try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) { + lineIterator.forEachRemaining(line -> { + String id = line.split(DELIM)[0]; Review Comment: I'm not sure about exception handling. If the file is corrupt (eg. empty line, corruption length,...), what would happen? It looks like it would throw an exception. Instead, a warning should be logged, and then it should continue (I assume). ########## oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java: ########## @@ -689,6 +711,9 @@ public long checkConsistency() throws Exception { // Mark all used blob references iterateNodeTree(fs, true); + // Move the marked references file to the data store meta area if applicable + String uniqueSuffix = UUID.randomUUID().toString(); + GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix); Review Comment: I'm not sure... it doesn't seem to be related to reporting the size? Not saying it is wrong... Just that I would expect it in a different PR. ########## oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java: ########## @@ -703,6 +728,21 @@ public long checkConsistency() throws Exception { // Retrieve all other marked present in the datastore List<DataRecord> refFiles = ((SharedDataStore) blobStore).getAllMetadataRecords(SharedStoreRecordType.REFERENCES.getType()); + + // Get all the repositories registered + List<DataRecord> repoFiles = + ((SharedDataStore) blobStore).getAllMetadataRecords(SharedStoreRecordType.REPOSITORY.getType()); + LOG.info("Repositories registered {}", repoFiles); + + // Retrieve repos for which reference files have not been created + Set<String> unAvailRepos = + SharedDataStoreUtils.refsNotAvailableFromRepos(repoFiles, refFiles); + LOG.info("Repositories with unavailable references {}", unAvailRepos); + + if (!unAvailRepos.isEmpty()) { + throw new NotAllRepositoryMarkedException("Not all repositories have marked references available"); Review Comment: Same as above: it doesn't seem to be related to reporting the size? ########## oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java: ########## @@ -1156,6 +1202,10 @@ class GarbageCollectionOperationStats implements OperationsStatsMBean { static final String NUM_BLOBS_DELETED = "NUM_BLOBS_DELETED"; static final String TOTAL_SIZE_DELETED = "TOTAL_SIZE_DELETED"; static final String NUM_CANDIDATES = "NUM_CANDIDATES"; + + static final String NUM_BLOB_REFERENCES = "NUM_BLOB_REFERENCES"; Review Comment: I would rename the fields... ########## oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java: ########## @@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE // Mark all used references iterateNodeTree(fs, false); - + + // Get size + sizeBlobStoreReferences(fs, stats); + // Move the marked references file to the data store meta area if applicable GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix); LOG.debug("Ending mark phase of the garbage collector"); } + private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats) + throws IOException { + try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) { + lineIterator.forEachRemaining(line -> { + String id = line.split(DELIM)[0]; + long length = DataStoreBlobStore.BlobId.of(id).getLength(); + LOG.info("Blob {} has size {}", id, length); + + stats.getCollector().updateNumBlobReferences(1); + + if (length != -1) { + stats.getCollector().updateSizeBlobReferences(length); Review Comment: what about "addTotalBlobSize"? ########## oak-blob-plugins/src/main/java/org/apache/jackrabbit/oak/plugins/blob/MarkSweepGarbageCollector.java: ########## @@ -405,13 +405,35 @@ protected void mark(GarbageCollectorFileState fs) throws IOException, DataStoreE // Mark all used references iterateNodeTree(fs, false); - + + // Get size + sizeBlobStoreReferences(fs, stats); + // Move the marked references file to the data store meta area if applicable GarbageCollectionType.get(blobStore).addMarked(blobStore, fs, repoId, uniqueSuffix); LOG.debug("Ending mark phase of the garbage collector"); } + private static void sizeBlobStoreReferences(GarbageCollectorFileState fs, GarbageCollectionOperationStats stats) + throws IOException { + try (LineIterator lineIterator = new LineIterator(new FileReader(fs.getMarkedRefs()))) { + lineIterator.forEachRemaining(line -> { + String id = line.split(DELIM)[0]; + long length = DataStoreBlobStore.BlobId.of(id).getLength(); + LOG.info("Blob {} has size {}", id, length); + + stats.getCollector().updateNumBlobReferences(1); Review Comment: what about "addTotalBlobCount" instead of "updateNumBlobReferences"? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@jackrabbit.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org