[ 
https://issues.apache.org/jira/browse/OAK-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Sedding updated OAK-2619:
--------------------------------
    Attachment: OAK-2619-patch

Patch that provides merge support. In order to do so efficiently, the copy 
algorithm is changed to merge leaf nodes first and then work its way up to the 
root. This allows to efficiently determine with a single traversal, whether a 
node has changes.

I ran some tests locally with the following results:

*Source Repository*
CRX2 repository with TarPM + FileDataStore, containing ~5mio nodes of 
production content.

*Target Repository*
Oak with TarMK + FileDataStore

Upgrade run with 1GB heap space, merge enabled and binaries copied by 
reference. Also, I
restricted the paths being copied using the feature from OAK-2573 to only copy 
~500k nodes,
of which ~70% are binaries. No versions were copied. Merge runs had no content 
changes in
the source repository.
                           
{noformat}
Results       Run 1       Run 2 (merge)
Without patch 1.008 min   1.037 min    
With patch    1.146 min   40.50 s
{noformat}

Also, in different test runs, I logged the diff seen by the commit hooks. This 
shows lots of changes in the copied content for the merge without the patch and 
no changes with the patch applied. 

> Support merging content during upgrade
> --------------------------------------
>
>                 Key: OAK-2619
>                 URL: https://issues.apache.org/jira/browse/OAK-2619
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: upgrade
>    Affects Versions: 1.1.7
>            Reporter: Julian Sedding
>            Priority: Minor
>         Attachments: OAK-2619-patch
>
>
> When upgrading from Jackrabbit 2 to Oak there are several scenarios that 
> could benefit from the ability to merge content rather than overwrite it. 
> Especially in combination with OAK-2586, i.e. support to include/exclude 
> selected paths from the copy operation, merging can become very useful.
> # Start vanilla product with an empty repo that writes some initial content, 
> then copy content from a Jackrabbit 2 repo into this instance
> # Unify content from several Jackrabbit 2 repositories into a single Oak repo
> # Copy all content 1 week before the actual migration, then merge in the diff 
> on migration day



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to