Am 16.11.2016 um 16:46 schrieb Tjard Kopka:
Hello,
we are using the pdfbox-app-2.0.3.jar library in an application which
crawls a huge Intranet reading also PDF-Documents and extracting
text-content.
In the last month we are facing problems caused by out of memory crashes
of the jvm. we are running java 1.8.0_65 under linux with -Xms512 -Xmx1024
The heap-dump analysis reports: The class "java.lang.ref.Finalizer",
loaded by "<system class loader>", occupies 470.713.224 (70,17%) bytes.
And the Memory Analyzer shows amongst others the following:
Class Name | Shallow Heap | Retained Heap | Percentage
-----------------------------------------------------------------------------------------------------------------------------------
| | |
class java.lang.ref.Finalizer @ 0xc0005768 System Class | 16
| 470.713.224 | 70,17%
|- java.lang.ref.Finalizer @ 0xed44ed10 | 40 |
470.713.192 | 70,17%
| |- java.lang.ref.Finalizer @ 0xed43c9b0 | 40 |
470.713.088 | 70,17%
| | |- java.lang.ref.Finalizer @ 0xed42b040 | 40 |
470.712.984 | 70,17%
| | | |- java.lang.ref.Finalizer @ 0xed419588 | 40 |
470.712.880 | 70,17%
| | | | |- java.lang.ref.Finalizer @ 0xed407b10 | 40 |
470.712.776 | 70,17%
| | | | | |- java.lang.ref.Finalizer @ 0xed3f6098 | 40 |
470.712.672 | 70,17%
| | | | | | |- java.lang.ref.Finalizer @ 0xed3e4620 |
40 | 470.712.568 | 70,17%
| | | | | | | |- java.lang.ref.Finalizer @ 0xed3d2b18 |
40 | 470.712.464 | 70,17%
| | | | | | | | |- java.lang.ref.Finalizer @ 0xed3bda48 | 40
| 470.712.360 | 70,17%
| | | | | | | | | |- java.lang.ref.Finalizer @ 0xed3abe48 |
40 | 470.712.256 | 70,17%
| | | | | | | | | | |- java.lang.ref.Finalizer @ 0xed39a3d0 |
40 | 470.712.152 | 70,17%
| | | | | | | | | | | |- java.lang.ref.Finalizer @ 0xed388798
| 40 | 470.712.048 | 70,17%
| | | | | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer
@ 0xed39a390| 64 | 64 | 0,00%
| | | | | | | | | | | '- Total: 2 entries | |
|
| | | | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3abe08 | 64 | 64 | 0,00%
| | | | | | | | | | '- Total: 2 entries | | |
| | | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3bda08 | 64 | 64 | 0,00%
| | | | | | | | | '- Total: 2 entries | | |
| | | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3d2ad8 | 64 | 64 | 0,00%
| | | | | | | | '- Total: 2 entries | |
|
| | | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @
0xed3e45e0 | 64 | 64 | 0,00%
| | | | | | | '- Total: 2 entries | | |
| | | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed3f6058
| 64 | 64 | 0,00%
| | | | | | '- Total: 2 entries | | |
| | | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed407ad0 |
64 | 64 | 0,00%
| | | | | '- Total: 2 entries | | |
| | | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed419548 |
64 | 64 | 0,00%
| | | | '- Total: 2 entries | | |
| | | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed42b000 | 64
| 64 | 0,00%
| | | '- Total: 2 entries | | |
| | |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed43c970 |
64 | 64 | 0,00%
| | '- Total: 2 entries | | |
| |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed44ecd0 |
64 | 64 | 0,00%
| '- Total: 2 entries | | |
|- java.lang.Object @ 0xc0005758 | 16 | 16 | 0,00%
'- Total: 2 entries | | |
-----------------------------------------------------------------------------------------------------------------------------------
Excerpt from our code:
try {
PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
...
textContent = stripper.getText(doc);
doc.close();
...
}
I have seen there are some similar Bugs reported:
https://issues.apache.org/jira/browse/PDFBOX-3253
https://issues.apache.org/jira/browse/PDFBOX-3388
Nevertheless, do you have a quick fix or workaround for us?
No... from what I see, the 70,1% are not pdfbox but java itself?!
Consider updating to the current 1.8 JDK version... if the problem
doesn't go away, try reproducing it with files you can share.
Tilman
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]