[ 
https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jukka Zitting reopened OAK-1804:
--------------------------------


There's two more problems:

* On a really large repository with hundreds of millions of nodes, the 
uncompressed compaction map inside the Compactor class can become huge, up to a 
few gigabytes. It would be better if we could use the far more memory-efficient 
CompactionMap data structure instead, and perhaps further limit the number of 
entries we store in the map in the first place.
* The compaction checks in fastEquals() add up to some performance overhead 
since they get executed for all sorts of record comparisons, not just for nodes 
and blobs. It would be better to do the compaction checks only for those higher 
level comparisons.

I'll take a look at fixing the above issues.

> TarMK compaction
> ----------------
>
>                 Key: OAK-1804
>                 URL: https://issues.apache.org/jira/browse/OAK-1804
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segmentmk
>            Reporter: Jukka Zitting
>            Assignee: Alex Parvulescu
>              Labels: production, tools
>             Fix For: 1.0.1, 1.1
>
>         Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, 
> compaction-map-as-bytebuffer.patch, compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would 
> traverse and recreate (parts of) the content tree in order to optimize the 
> storage layout. More specifically, such compaction would:
> * Optimize performance by increasing locality and reducing duplication, both 
> of which improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing 
> references to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to