Hello guys!
We have implemented JCR facade for our portal system based on JackRabbit.
Facade delegates it's calls to JackRabbit repository, and if data is not
available, a request to a legacy CMS is performed and data is inserted into
JackRabbit.
The structure of repository is similar to:
/root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty
Number of possible path values at each level is from 2 to 20.
id and sub_id are unique identifiers of the document, under them document
structure is stored with maximum depth of 5. Many documents have the same type
of structure and property names.
We faced some performance bottlenecks and I have tried to profile our
application (with YourKit Java Profiler) and I have noticed that there are many
duplicate strings stored on the heap and most of those duplicates (I mean
almost all of them) are contained by
org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl class
instances.
After doing several portal page requests (which would mean about 1000 JR
requests) and taking memory snapshot I have noticed that string "root" is
stored and contained by NameImpl about 12000 times, which is about 2Mb waste.
Also other strings with values of the repository level names and property names
had from 11000 to 3000 duplicates. The total calculated waste is about 50Mb and
that is only after not that many requests.
It is probably not the only memory/performance bottleneck and it also could be
that our app is doing something wrong, but it would be good to get some ideas
on that from you guys.
After leaving server alone and not doing anything on that for a while (6-8
hours), I have taken memory snapshot again and the number of duplicates has
slightly reduced, but I would not say that it changed a lot or many of the
duplicate strings have been garbage-collected.
I have also looked at the source of NameFactoryImpl$NameImpl and found that it
uses String.intern() for name space storing, but not for local name part, which
is wise in general, but may not work if JackRabbit is stressed to have too many
requests.
Therefore I have several questions, that some of you may help me with:
1) Is there a way to implement different name creation strategy? I see that
NameFactory is an interface, but how would I plug in different implementation
to adapt to my repository structure, so, that "root" string would not be stored
12000 times or even more?
2) Can someone explain me how JR cache manager works and can this leak happen
because of cache manager storing to many states? Is the size of JR cache
depends on the live session number? Would it be wise to disable it? or at least
limit it?
Best regards,
Andrey
___________________________________________________________
Support the World Aids Awareness campaign this month with Yahoo! For Good
http://uk.promotions.yahoo.com/forgood/