Hi,

Would it be possible for you to create a simple, standalone test case
(that means, no external dependencies except Jackrabbit core, a single
class with a main method, similar to the FirstHop examples)? Best
would be if the memory problem can be reproduced using the standard
configuration; if not, could you also send the configuration you use?

Getting rid of the duplicate Strings would be fairly easy (using a
simple string cache), but we need to be sure we have a test case so we
know we solve the right problem.

Thanks,
Thomas


On Jan 23, 2008 5:51 PM, Andrey Adamovich <[EMAIL PROTECTED]> wrote:
> Hello guys!
>
> We have implemented JCR facade for our portal system based on JackRabbit. 
> Facade delegates it's calls to JackRabbit repository, and if data is not 
> available, a request to a legacy CMS is performed and data is inserted into 
> JackRabbit.
>
> The structure of repository is similar to:
>
> /root/<level1>/<level2>/<leve3>/<level4>/<level5>/<id>/<sub_id>/<DocumentNode1>/<DocumentNode2>/textProperty
>
> Number of possible path values at each level is from 2 to 20.
> id and sub_id are unique identifiers of the document, under them document 
> structure is stored with maximum depth of 5. Many documents have the same 
> type of structure and property names.
>
> We faced some performance bottlenecks and I have tried to profile our 
> application (with YourKit Java Profiler) and I have noticed that there are 
> many duplicate strings stored on the heap and most of those duplicates (I 
> mean almost all of them) are contained by 
> org.apache.jackrabbit.spi.commons.name.NameFactoryImpl$NameImpl class 
> instances.
>
> After doing several portal page requests (which would mean about 1000 JR 
> requests) and taking memory snapshot I have noticed that string "root" is 
> stored and contained by NameImpl about 12000 times, which is about 2Mb waste. 
> Also other strings with values of the repository level names and property 
> names had from 11000 to 3000 duplicates. The total calculated waste is about 
> 50Mb and that is only after not that many requests.
>
> It is probably not the only memory/performance bottleneck and it also could 
> be that our app is doing something wrong, but it would be good to get some 
> ideas on that from you guys.
>
> After leaving server alone and not doing anything on that for a while (6-8 
> hours), I have taken memory snapshot again and the number of duplicates has 
> slightly reduced, but I would not say that it changed a lot or many of the 
> duplicate strings have been garbage-collected.
>
> I have also looked at the source of NameFactoryImpl$NameImpl and found that 
> it uses String.intern() for name space storing, but not for local name part, 
> which is wise in general, but may not work if JackRabbit is stressed to have 
> too many requests.
>
> Therefore I have several questions, that some of you may help me with:
>
> 1) Is there a way to implement different name creation strategy? I see that 
> NameFactory is an interface, but how would I plug in different implementation 
> to adapt to my repository structure, so, that "root" string would not be 
> stored 12000 times or even more?
>
> 2) Can someone explain me how JR cache manager works and can this leak happen 
> because of cache manager storing to many states? Is the size of JR cache 
> depends on the live session number? Would it be wise to disable it? or at 
> least limit it?
>
> Best regards,
>
> Andrey
>
>
>
>       ___________________________________________________________
> Support the World Aids Awareness campaign this month with Yahoo! For Good 
> http://uk.promotions.yahoo.com/forgood/

Reply via email to