Hello,

we are using the pdfbox-app-2.0.3.jar library in an application which 
crawls a huge Intranet reading also PDF-Documents and extracting 
text-content.

In the last month we are facing problems caused by out of memory crashes 
of the jvm. we are running java 1.8.0_65 under linux with -Xms512 -Xmx1024

The heap-dump analysis reports: The class "java.lang.ref.Finalizer", 
loaded by "<system class loader>", occupies 470.713.224 (70,17%) bytes. 

And the Memory Analyzer shows amongst others the following:

Class Name             | Shallow Heap | Retained Heap | Percentage
-----------------------------------------------------------------------------------------------------------------------------------
             |              |               | 
class java.lang.ref.Finalizer @ 0xc0005768 System Class             |  16 
|   470.713.224 |     70,17%
|- java.lang.ref.Finalizer @ 0xed44ed10             |           40 | 
470.713.192 |     70,17%
|  |- java.lang.ref.Finalizer @ 0xed43c9b0             |           40 | 
470.713.088 |     70,17%
|  |  |- java.lang.ref.Finalizer @ 0xed42b040             |           40 | 
  470.712.984 |     70,17%
|  |  |  |- java.lang.ref.Finalizer @ 0xed419588             | 40 | 
470.712.880 |     70,17%
|  |  |  |  |- java.lang.ref.Finalizer @ 0xed407b10             |  40 | 
470.712.776 |     70,17%
|  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3f6098             |  40 | 
  470.712.672 |     70,17%
|  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3e4620             |    
 40 |   470.712.568 |     70,17%
|  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3d2b18             | 
          40 |   470.712.464 |     70,17%
|  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3bda48   |    40 
|   470.712.360 |     70,17%
|  |  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed3abe48      |  
        40 |   470.712.256 |     70,17%
|  |  |  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed39a3d0    | 
          40 |   470.712.152 |     70,17%
|  |  |  |  |  |  |  |  |  |  |  |- java.lang.ref.Finalizer @ 0xed388798   
         |           40 |   470.712.048 |     70,17%
|  |  |  |  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer 
@ 0xed39a390|           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  |  |  |  '- Total: 2 entries             |      |  
            | 
|  |  |  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 
0xed3abe08   |           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  |  |  '- Total: 2 entries             |   |  |   
|  |  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 
0xed3bda08      |           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  |  '- Total: 2 entries             | |   |  
|  |  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 
0xed3d2ad8         |           64 |            64 |      0,00%
|  |  |  |  |  |  |  |  '- Total: 2 entries             |              |   
         | 
|  |  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 
0xed3e45e0            |           64 |            64 |      0,00%
|  |  |  |  |  |  |  '- Total: 2 entries             |              |   |  
 
|  |  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed3f6058   
         |           64 |            64 |      0,00%
|  |  |  |  |  |  '- Total: 2 entries             |              |       | 
 
|  |  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed407ad0    | 
          64 |            64 |      0,00%
|  |  |  |  |  '- Total: 2 entries             |              |    |  
|  |  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed419548      |  
        64 |            64 |      0,00%
|  |  |  |  '- Total: 2 entries             |              | | 
|  |  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed42b000   |    64 
|            64 |      0,00%
|  |  |  '- Total: 2 entries             |              |               |  
 
|  |  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed43c970             | 
          64 |            64 |      0,00%
|  |  '- Total: 2 entries             |              |               |   
|  |- org.apache.pdfbox.io.ScratchFileBuffer @ 0xed44ecd0             |    
 64 |            64 |      0,00%
|  '- Total: 2 entries             |              |               |  
|- java.lang.Object @ 0xc0005758             |           16 | 16 | 0,00%
'- Total: 2 entries             |              |               | 
-----------------------------------------------------------------------------------------------------------------------------------

Excerpt from our code:

try {
PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
...
textContent = stripper.getText(doc);
doc.close();
...
} 

I have seen there are some similar Bugs reported: 
https://issues.apache.org/jira/browse/PDFBOX-3253
https://issues.apache.org/jira/browse/PDFBOX-3388

Nevertheless, do you have a quick fix or workaround for us?

Thanks
Tjard


---------------------------------------------------------------------
Deutsche Vermögensberatung Aktiengesellschaft DVAG
Münchener Straße 1
60329 Frankfurt am Main
Vorstandsvorsitzender: Andreas Pohl
Mitglieder des Vorstandes: Dr. h.c. /HLU Udo Corts, Hans-Theo Franken, 
Christian Glanz, 
Lars Knackstedt, Dr. Helge Lach, Robert Peil, Dr. Dirk Reiffenrath
Aufsichtsratsvorsitzender: Friedrich Bohl
Sitz der Gesellschaft: Frankfurt am Main
Handelsregister Frankfurt HRB 15511
USt-Ident.-Nr.: DE 114 139 839
Erlaubnis- und Aufsichtsbehörde nach § 34c GewO: Stadt Frankfurt am Main, 
Ordnungsamt, Kleyerstraße 86, 60326 Frankfurt am Main 
Erlaubnis- und Aufsichtsbehörde nach § 34f GewO: IHK Frankfurt am Main, 
Börsenplatz 4, 60313 Frankfurt am Main 
Gemeinsame Registerstelle für § 34d GewO und § 34f GewO: 
Deutscher Industrie- und Handelskammertag (DIHK) e.V. 
Breite Straße 29, 10178 Berlin, Telefon 0180 600585-0 
(20 Cent/Anruf aus dem deutschen Festnetz, höchstens 60 Cent/Anruf aus 
Mobilfunknetzen) 
www.vermittlerregister.info oder www.vermittlerregister.org 
Registernummer nach § 34d GewO: D-LYYB-BSPX5-17 
Registernummer nach § 34f GewO: D-F-125-93J4-60 
--------------------------------------------------------------------- 

Reply via email to