Jack Krupansky <jack <at> basetechnology.com> writes: > > I vaguely recall some thread blocking issue with trying to parse too many > PDF files at one time in the same JVM. > > Occasionally Tika (actually PDFBox) has been known to hang for some PDF > docs. > > Do you have enough memory in the JVM? When the CPU is busy, is there much > memory available in the JVM? Maybe garbage collection is taking too much of > the CPU. >
Hi Jack, Thanks for your quick response. Yes. I hope I have enough JVM memory. Here is the mem settings. -Xms11g -Xmx11g -XX:MaxPermSize=2g Is this a common issue seen for PDF extraction and indexing? Why i am not able to do more than 1k documents per hour? Thanks, Surendra.