[ https://issues.apache.org/jira/browse/OAK-7105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davide Giannella updated OAK-7105: ---------------------------------- Fix Version/s: (was: 1.9.0) > Implement a traverse with sort strategy for DocumentStoreIndexer > ---------------------------------------------------------------- > > Key: OAK-7105 > URL: https://issues.apache.org/jira/browse/OAK-7105 > Project: Jackrabbit Oak > Issue Type: Improvement > Components: run > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Fix For: 1.8.0 > > > Currently the DocumentStoreIndexer logic uses a StoreAndSortStrategy in which > it first dumps all nodestates to a json file -> sort them in batches -> merge > the sorted file. In whole indexing the sorting phase is taking decent amount > of time (40 mins out of 3 hr run). > Further this approach suffers with potential OOM while ExternalSort creates > in memory batches where actual size of batch exceeds the estimated size > considerably. So we need to constant tweak the > "oak.indexer.maxSortMemoryInGB" (currently set to 2 GB) > As an improvement we can do following changes > # Implement a traverse with sort strategy - Here instead of first dumping all > nodestate in a single big json we instead add them to an in memory buffer and > then at some stage sort the batch and save it to file > # Use better memory checks - Use the approach as implemented in GCBarrier > i.e. monitor the current memory usage and if it goes below certain threshold > trigger the batch sort -- This message was sent by Atlassian JIRA (v6.4.14#64029)