Nikolaos Giannadakis wrote:
You could analyze your sample documents and try to get a measure of the
average 'distance' between two identical tags - d, let's say. You could then
start flushing the cache when it holds d entries.

Sorry. That is fundamentally broken and can't be fixed. a) At least one string's distance is the length of the document. b) This is not a statistical problem.


The only thing you could do is figure out a more compact way to store the strings (e.g., in a char array or even a compressed char array) and return something other than String from your intern method, like an integer. IIRC Saxon does something like this internally. So there's even an open source implementation.

But the API would need to allow returning non-Objects to let you do this. A long should do it.

Bob Foster


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to