We were having performance issues during the commit of a transaction that contained a huge amount of node inserts. We found that a big chunk of the commit time was spent in the method org.apache.jackrabbit.core.journal.AbstractRecord.getOrCreateIndex(). In this class, the uuidIndex instance variable is an ArrayList and uuidIndex.indexOf(nodeId) is called to search through this ArrayList to see if a node is already in the list. ArrayList.indexOf() searches the list from beginning to end calling NodeId.equals() until the entry is located and then returns. This can be quite inefficient if the list becomes large.

In our case, we had *42K* entries in this list with about *400K* calls to getOrCreateIndex() resulting in over *9 billion* calls to NodeId.equals(). By combining a HashMap with the ArrayList and checking the HashMap instead of the ArrayList to see if the node is in the cache, we traded the 400K calls to ArrayList.indexOf() (and the subsequent 9 billions calls to NodeId.equals()) for 400K calls HashMap.get() (and its subsequent calls). Our commit time went from about 30 minutes to about 3 minutes.

There may be more elegant ways to fix this problem, but our changes to AbstractRecord are listed below. Maybe some changes along these lines could be incorporated into this class to improve performance. Thanks.

-------------------- old code --------------------

private int getOrCreateIndex(NodeId nodeId) {
   int index = uuidIndex.indexOf(nodeId);

   if (index == -1) {
       uuidIndex.add(nodeId);
   }
   return index;
}

-------------------- new code --------------------

private final HashMap<String, Integer> uuidMap = new HashMap<String, Integer>();

private int getOrCreateIndex(NodeId nodeId) {
   Integer index = uuidMap.get(nodeId.toString());

   if (index == null) {
       uuidIndex.add(nodeId);
       uuidMap.put(nodeId.toString(), uuidIndex.size() - 1);
       index = -1;
   }
   return index;
}

Reply via email to