Hi, Would it be possible for you to create a simple, standalone test case (that means, no external dependencies except Jackrabbit core, a single class with a main method, similar to the FirstHop examples)? Best would be if the memory problem can be reproduced using the standard configuration; if not, could you also send the configuration you use?
Getting rid of the duplicate Strings would be fairly easy (using a simple string cache), but we need to be sure we have a test case so we know we solve the right problem. Thanks, Thomas On Jan 23, 2008 5:51 PM, Andrey Adamovich <[EMAIL PROTECTED]> wrote: > Hello guys! > > We have implemented JCR facade for our portal system based on JackRabbit. > Facade delegates it's calls to JackRabbit repository, and if data is not > available, a request to a legacy CMS is performed and data is inserted into > JackRabbit. > > The structure of repository is similar to: > > /root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty > > Number of possible path values at each level is from 2 to 20. > id and sub_id are unique identifiers of the document, under them document > structure is stored with maximum depth of 5. Many documents have the same > type of structure and property names. > > We faced some performance bottlenecks and I have tried to profile our > application (with YourKit Java Profiler) and I have noticed that there are > many duplicate strings stored on the heap and most of those duplicates (I > mean almost all of them) are contained by > org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl class > instances. > > After doing several portal page requests (which would mean about 1000 JR > requests) and taking memory snapshot I have noticed that string "root" is > stored and contained by NameImpl about 12000 times, which is about 2Mb waste. > Also other strings with values of the repository level names and property > names had from 11000 to 3000 duplicates. The total calculated waste is about > 50Mb and that is only after not that many requests. > > It is probably not the only memory/performance bottleneck and it also could > be that our app is doing something wrong, but it would be good to get some > ideas on that from you guys. > > After leaving server alone and not doing anything on that for a while (6-8 > hours), I have taken memory snapshot again and the number of duplicates has > slightly reduced, but I would not say that it changed a lot or many of the > duplicate strings have been garbage-collected. > > I have also looked at the source of NameFactoryImpl$NameImpl and found that > it uses String.intern() for name space storing, but not for local name part, > which is wise in general, but may not work if JackRabbit is stressed to have > too many requests. > > Therefore I have several questions, that some of you may help me with: > > 1) Is there a way to implement different name creation strategy? I see that > NameFactory is an interface, but how would I plug in different implementation > to adapt to my repository structure, so, that "root" string would not be > stored 12000 times or even more? > > 2) Can someone explain me how JR cache manager works and can this leak happen > because of cache manager storing to many states? Is the size of JR cache > depends on the live session number? Would it be wise to disable it? or at > least limit it? > > Best regards, > > Andrey > > > > ___________________________________________________________ > Support the World Aids Awareness campaign this month with Yahoo! For Good > http://uk.promotions.yahoo.com/forgood/
