We were having performance issues during the commit of a transaction
that contained a huge amount of node inserts. We found that a big chunk
of the commit time was spent in the method
org.apache.jackrabbit.core.journal.AbstractRecord.getOrCreateIndex().
In this class, the uuidIndex instance variable is an ArrayList and
uuidIndex.indexOf(nodeId) is called to search through this ArrayList to
see if a node is already in the list. ArrayList.indexOf() searches the
list from beginning to end calling NodeId.equals() until the entry is
located and then returns. This can be quite inefficient if the list
becomes large.
In our case, we had *42K* entries in this list with about *400K* calls
to getOrCreateIndex() resulting in over *9 billion* calls to
NodeId.equals(). By combining a HashMap with the ArrayList and checking
the HashMap instead of the ArrayList to see if the node is in the cache,
we traded the 400K calls to ArrayList.indexOf() (and the subsequent 9
billions calls to NodeId.equals()) for 400K calls HashMap.get() (and its
subsequent calls). Our commit time went from about 30 minutes to about
3 minutes.
There may be more elegant ways to fix this problem, but our changes to
AbstractRecord are listed below. Maybe some changes along these lines
could be incorporated into this class to improve performance. Thanks.
-------------------- old code --------------------
private int getOrCreateIndex(NodeId nodeId) {
int index = uuidIndex.indexOf(nodeId);
if (index == -1) {
uuidIndex.add(nodeId);
}
return index;
}
-------------------- new code --------------------
private final HashMap<String, Integer> uuidMap = new HashMap<String,
Integer>();
private int getOrCreateIndex(NodeId nodeId) {
Integer index = uuidMap.get(nodeId.toString());
if (index == null) {
uuidIndex.add(nodeId);
uuidMap.put(nodeId.toString(), uuidIndex.size() - 1);
index = -1;
}
return index;
}