Preventing OutOfMemory exception

2016-02-08 Thread Steven White
Hi everyone, I'm integrating Tika with my application and need your help to figure out if the OOM I'm getting is due to the way I'm using Tika or if it is an issue with parsing XML files. The following example code is causing OOM on 7th iteration with -Xmx2g. The test will pass with -Xmx4g. The

RE: Preventing OutOfMemory exception

2016-02-08 Thread Allison, Timothy B.
I’m not sure why you’d want to append document contents across documents into one handler. Typically, you’d use a new ContentHandler and new Metadata object for each parse. Calling “toString()” does not clear the content handler, and you should have 20 copies of the extracted content on your f

Re: Preventing OutOfMemory exception

2016-02-08 Thread Steven White
Hi Tim, The code I showed is a minimal example code to show the issue I'm running into, which is: memory keeps on growing. In production, the loop that you see will read files off a file system and parse them using the logic close to what I sowed. I use contentHandler.toString() to get back the

RE: Preventing OutOfMemory exception

2016-02-08 Thread Allison, Timothy B.
In your actual code, are you using one BodyContentHandler for all of your files? Or are you creating a new BodyContentHandler for each file? If the former, then, y, there’s a problem with your code; if the latter, that’s not something I’ve seen before. From: Steven White [mailto:swhite4...@gm