I'm working on adding a daemon to Tika Server so that it will restart when it 
hits an OOM or other big problem (infinite hangs).  That won't be available 
until Tika 1.7.  

To amplify Nick's recommendations:

ForkParser or Server are your best options for now.

Are there specific files/file types that are causing the OOM?  Given the size 
of files, is the OOM surprising?  

On TIKA-1294, we found that a specific 4MB PDF would cause an OOM with -Xmx1g.  
 That was surprising and was very quickly addressed by the PDFBox developers.  
If you have specific files that are surprising, please file an issue.

Thank you!


________________________________________
From: Nick Burch [apa...@gagravarr.org]
Sent: Friday, July 18, 2014 4:32 AM
To: user@tika.apache.org
Subject: Re: Avoiding Out of Memory Errors

On Thu, 17 Jul 2014, Shannon Brown wrote:
> Problem:
> How to avoid Out of Memory errors during Tika parsing.

Typical approaches are either to use the ForkParser, or the Tika Server.
Both ensure that if there's a fatal problem with parsing (eg OOM) then
the JVM with your main application in it doesn't die too

For cases where it does die, log it, and if possible report a bug with the
file in question, so we can hopefully fix it for the next release!

Nick

Reply via email to