Hello guys!

We have implemented JCR facade for our portal system based on JackRabbit. 
Facade delegates it's calls to JackRabbit repository, and if data is not 
available, a request to a legacy CMS is performed and data is inserted into 
JackRabbit.

The structure of repository is similar to:

/root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty

Number of possible path values at each level is from 2 to 20. 
id and sub_id are unique identifiers of the document, under them document 
structure is stored with maximum depth of 5. Many documents have the same type 
of structure and property names.

We faced some performance bottlenecks and I have tried to profile our 
application (with YourKit Java Profiler) and I have noticed that there are many 
duplicate strings stored on the heap and most of those duplicates (I mean 
almost all of them) are contained by 
org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl class 
instances. 

After doing several portal page requests (which would mean about 1000 JR 
requests) and taking memory snapshot I have noticed that string "root" is 
stored and contained by NameImpl about 12000 times, which is about 2Mb waste. 
Also other strings with values of the repository level names and property names 
had from 11000 to 3000 duplicates. The total calculated waste is about 50Mb and 
that is only after not that many requests.

It is probably not the only memory/performance bottleneck and it also could be 
that our app is doing something wrong, but it would be good to get some ideas 
on that from you guys.

After leaving server alone and not doing anything on that for a while (6-8 
hours), I have taken memory snapshot again and the number of duplicates has 
slightly reduced, but I would not say that it changed a lot or many of the 
duplicate strings have been garbage-collected.

I have also looked at the source of NameFactoryImpl$NameImpl and found that it 
uses String.intern() for name space storing, but not for local name part, which 
is wise in general, but may not work if JackRabbit is stressed to have too many 
requests.

Therefore I have several questions, that some of you may help me with:

1) Is there a way to implement different name creation strategy? I see that 
NameFactory is an interface, but how would I plug in different implementation 
to adapt to my repository structure, so, that "root" string would not be stored 
12000 times or even more?

2) Can someone explain me how JR cache manager works and can this leak happen 
because of cache manager storing to many states? Is the size of JR cache 
depends on the live session number? Would it be wise to disable it? or at least 
limit it? 

Best regards, 

Andrey 



      ___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good 
http://uk.promotions.yahoo.com/forgood/

Reply via email to