I'm working on adding a daemon to Tika Server so that it will restart when it hits an OOM or other big problem (infinite hangs). That won't be available until Tika 1.7.
To amplify Nick's recommendations: ForkParser or Server are your best options for now. Are there specific files/file types that are causing the OOM? Given the size of files, is the OOM surprising? On TIKA-1294, we found that a specific 4MB PDF would cause an OOM with -Xmx1g. That was surprising and was very quickly addressed by the PDFBox developers. If you have specific files that are surprising, please file an issue. Thank you! ________________________________________ From: Nick Burch [apa...@gagravarr.org] Sent: Friday, July 18, 2014 4:32 AM To: user@tika.apache.org Subject: Re: Avoiding Out of Memory Errors On Thu, 17 Jul 2014, Shannon Brown wrote: > Problem: > How to avoid Out of Memory errors during Tika parsing. Typical approaches are either to use the ForkParser, or the Tika Server. Both ensure that if there's a fatal problem with parsing (eg OOM) then the JVM with your main application in it doesn't die too For cases where it does die, log it, and if possible report a bug with the file in question, so we can hopefully fix it for the next release! Nick