Re: OutOfMemory due to SymbolTable caching!

Michael Glavassevich 17 Mar 2005 20:46:42 -0000

When addSymbol is called with the same char argument you'll get the same 
string object, since:


new String(buffer1, offset1, length1).intern() == new String(buffer2, 
offset2, length2).intern()
if length1 == length2 and buffer1[offset1 + n] == buffer2[offset2 + n] for 
all n < length1.

My intention was to show that it's possible to replace the SymbolTable 
with one that consumes less memory (though this particular example will 
perform quite poorly as it churns the heap on every invocation).

Bob Foster <[EMAIL PROTECTED]> wrote on 03/17/2005 01:10:26 AM:

> I'm really missing something here. What do you do when addSymbol is 
> called with the same char argument as before?
> 
> Bob Foster
> http://xmlbuddy.com/
> 
> Michael Glavassevich wrote:
> > String.intern() always returns a unique string object, so you could 
> > override addSymbol like this:
> > 
> > public String addSymbol(char[] buffer, int offset, int length) {
> >    return new String(buffer, offset, length).intern();
> > }
> > 
> > and cache nothing in the table. This would be slower but you'd save 
memory 
> > which would have otherwise been used to create a new Entry and a new 
> > character array (that has the same contents as the string). The memory 

> > consumption of String.intern() is another story, but for Xerces to 
work 
> > correctly it can't be avoided.
> > 
> > Bob Foster <[EMAIL PROTECTED]> wrote on 03/16/2005 10:43:51 PM:
> > 
> > 
> >>But it isn't clear what he could do to reduce the memory used by the 
> >>SymbolTable, given that every element name will be referenced at least 

> >>twice. Since the table guarantees that an identical String be 
returned, 
> >>how do you keep from needing all those Strings in memory? Maybe you 
> >>could expand on your suggestion?
> >>
> >>Bob Foster
> >>http://xmlbuddy.com/
> >>
> >>Michael Glavassevich wrote:
> >>
> >>>You could extend the SymbolTable class and then replace the default 
by 
> > 
> > 
> >>>setting the http://apache.org/xml/properties/internal/symbol-table 
> >>>property with your own SymbolTable. The parser components assume that 

> >>>symbols returned from the SymbolTable have been internalized with 
> >>>String.intern() so all SymbolTable implementations must return 
> >>>internalized strings in order for the parser to function properly.
> >>>
> >>>[EMAIL PROTECTED] wrote on 03/16/2005 11:43:52 AM:
> >>>
> >>>
> >>>
> >>>>Hello,
> >>>>
> >>>>Our application is going out of memory on moderate sized XML files 
> >>>
> >>>(2-4MB)
> >>>
> >>>
> >>>>containing random XML tags due to the caching nature of the 
> > 
> > SymbolTable. 
> > 
> >>>The
> >>>
> >>>
> >>>>random tags come from our customers and are embedded as a subtree in 

> > 
> > our 
> > 
> >>>own
> >>>
> >>>
> >>>>xml document. This is a historical decision and can not be reverted. 

> > 
> > The
> > 
> >>>>result is that we get XML documents were we end up with 80000+ 
> > 
> > different 
> > 
> >>>XML
> >>>
> >>>
> >>>>tags. Profiling learnt me that parsing such a file goes up to 20MB 
of 
> >>>
> >>>memory
> >>>
> >>>
> >>>>referred to by the SymbolTable. With the possibility of parsing 
> > 
> > multiple 
> > 
> >>>of
> >>>
> >>>
> >>>>such documents in parallel, we can go OOM very easily.
> >>>>
> >>>>Is there anything I can do to circumvent/optimize the internal usage 

> > 
> > of 
> > 
> >>>the
> >>>
> >>>
> >>>>SymbolTable?
> >>>>
> >>>>Ringo
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: OutOfMemory due to SymbolTable caching!

Reply via email to