Am 16.11.2016 um 16:46 schrieb Tjard Kopka:
Hello,

we are using the pdfbox-app-2.0.3.jar library in an application which
crawls a huge Intranet reading also PDF-Documents and extracting
text-content.

In the last month we are facing problems caused by out of memory crashes
of the jvm. we are running java 1.8.0_65 under linux with -Xms512 -Xmx1024

The heap-dump analysis reports: The class "java.lang.ref.Finalizer",
loaded by "<system class loader>", occupies 470.713.224 (70,17%) bytes.

And the Memory Analyzer shows amongst others the following:

Class Name             | Shallow Heap | Retained Heap | Percentage
-----------------------------------------------------------------------------------------------------------------------------------
              |              |               |
class java.lang.ref.Finalizer @ 0xc0005768 System Class             |  16
|   470.713.224 |     70,17%
|- java.lang.ref.Finalizer @ 0xed44ed10             |           40 |
470.713.192 |     70,17%
|  |- java.lang.ref.Finalizer @ 0xed43c9b0             |           40 |
470.713.088 |     70,17%
|  |  |- java.lang.ref.Finalizer @ 0xed42b040             |           40 |
   470.712.984 |     70,17%
|  |  |  |- java.lang.ref.Finalizer @ 0xed419588             | 40 |
470.712.880 |     70,17%
|  |  |  |  |- java.lang.ref.Finalizer @ 0xed407b10             |  40 |
470.712.776 |     70,17%
|  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3f6098             |  40 |
   470.712.672 |     70,17%
|  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3e4620             |
  40 |   470.712.568 |     70,17%
|  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3d2b18             |
           40 |   470.712.464 |     70,17%
|  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3bda48   |    40
|   470.712.360 |     70,17%
|  |  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3abe48      |
         40 |   470.712.256 |     70,17%
|  |  |  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed39a3d0    |
           40 |   470.712.152 |     70,17%
|  |  |  |  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed388798
          |           40 |   470.712.048 |     70,17%
|  |  |  |  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer
@ 0xed39a390|           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  |  |  |  '- Total: 2 entries             |      |
             |
|  |  |  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3abe08   |           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  |  |  '- Total: 2 entries             |   |  |
|  |  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3bda08      |           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  |  '- Total: 2 entries             | |   |
|  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3d2ad8         |           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  '- Total: 2 entries             |              |
          |
|  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3e45e0            |           64 |            64 |      0,00%
|  |  |  |  |  |  |  '- Total: 2 entries             |              |   |
| | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed3f6058
          |           64 |            64 |      0,00%
|  |  |  |  |  |  '- Total: 2 entries             |              |       |
| | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed407ad0 |
           64 |            64 |      0,00%
|  |  |  |  |  '- Total: 2 entries             |              |    |
|  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed419548      |
         64 |            64 |      0,00%
|  |  |  |  '- Total: 2 entries             |              | |
|  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed42b000   |    64
|            64 |      0,00%
|  |  |  '- Total: 2 entries             |              |               |
| | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed43c970 |
           64 |            64 |      0,00%
|  |  '- Total: 2 entries             |              |               |
|  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed44ecd0             |
  64 |            64 |      0,00%
|  '- Total: 2 entries             |              |               |
|- java.lang.Object @ 0xc0005758             |           16 | 16 | 0,00%
'- Total: 2 entries             |              |               |
-----------------------------------------------------------------------------------------------------------------------------------

Excerpt from our code:

try {
PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
...
textContent = stripper.getText(doc);
doc.close();
...
}

I have seen there are some similar Bugs reported:
https://issues.apache.org/jira/browse/PDFBOX-3253
https://issues.apache.org/jira/browse/PDFBOX-3388

Nevertheless, do you have a quick fix or workaround for us?

No... from what I see, the 70,1% are not pdfbox but java itself?!

Consider updating to the current 1.8 JDK version... if the problem doesn't go away, try reproducing it with files you can share.

Tilman



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to