Re: pdfs

Jack Krupansky Wed, 21 May 2014 21:47:37 -0700

Yeah, PDF extraction has always been at least somewhat problematic. It hasimproved over the years, but still not likely to be perfect.

That said, I'm not aware of any specific PDF extraction issue that wouldbring down Solr - as opposed to causing a 500 status with an exception inPDF extraction, with the exception of memory usage. Some PDF documents,especially those which are graphic-intense can require a lot of memory. Therest of Solr could be adversely affected if all available JVM heap isconsumed. The solution is to give the JVM more heap space.


So, what is your specific symptom?

-- Jack Krupansky

-----Original Message-----From: Brian McDowell

Sent: Thursday, May 22, 2014 12:24 AM
To: solr-user@lucene.apache.org
Subject: pdfs

Has anyone had issues with indexing pdf files? Some pdfs are bringing down
Solr completely so that it actually needs to be manually restarted. We are
using Solr 4.4 and thought that upgrading to Solr 4.8 would solve the
problem because the release notes associated with the new tika version and
also the new pdfbox indicate fixes for pdf issues. It didn't work and now
this issue is causing us to reevaluate using Solr. Any help on this matter

would be greatly appreciated. Thank you!

Re: pdfs

Reply via email to