Michael Dürig created OAK-2896:
----------------------------------

             Summary: Putting many elements into a map results in many small 
segments. 
                 Key: OAK-2896
                 URL: https://issues.apache.org/jira/browse/OAK-2896
             Project: Jackrabbit Oak
          Issue Type: Bug
          Components: segmentmk
            Reporter: Michael Dürig
             Fix For: 1.3.0


There is an issue with how the HAMT implementation 
({{SegmentWriter.writeMap()}} interacts with the 256 segment references limit 
when putting many entries into the map: This limit gets regularly reached once 
the maps contains about 200k entries. At that points segments get prematurely 
flushed resulting in more segments, thus more references and thus even smaller 
segments. It is common for segments to be as small as 7k with a tar file 
containing up to 35k segments. This is problematic as at this point handling of 
the segment graph becomes expensive, both memory and CPU wise. I have seen 
persisted segment graphs as big as 35M where the usual size is a couple of ks. 

As the HAMT map is used for storing children of a node this might have an 
advert effect on nodes with many child nodes. 

The following code can be used to reproduce the issue: 

{code}
SegmentWriter writer = new SegmentWriter(segmentStore, getTracker(), V_11);
MapRecord baseMap = null;

for (;;) {
    Map<String, RecordId> map = newHashMap();
    for (int k = 0; k < 1000; k++) {
        RecordId stringId = writer.writeString(String.valueOf(rnd.nextLong()));
        map.put(String.valueOf(rnd.nextLong()), stringId);
    }

    Stopwatch w = Stopwatch.createStarted();
    baseMap = writer.writeMap(baseMap, map);
    System.out.println("baseMap.size() + " " + w.elapsed());
}
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to