Re[8]: PDFRenderer, PDDocument memory issue

Alex Sviridov Wed, 01 Jul 2015 05:00:12 -0700

 Ok. Thank you very much for explanation. Could you say where this scratch file 
is located linux/windows?



Среда,  1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler <andr...@lehmi.de>:
>> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 13:38 
>> geschrieben:
>> 
>> 
>>  The file is here  https://yadi.sk/i/Y0fTuvHmhbZiE
>Ah, that explains a lot. The pdf is a scanned document, every page holds a 
>color
>image, consuming a lot of memory when processed
>
>> I tried with load (fileName,true). The result - now I don't have memory
>> problems. However now I have 2 problems:
>>
>> 1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One
>> thumbnail image is loaded about 4 seconds! 
>If it comes to huge pdfs, you have to die one death. Either you provide enough
>memory to do all the stuff in memory (fast) or you use a scratch file to save
>memory (slow)
>
>And yes, there is room for an improvement of the memory handling (read on
>demand, remove after usage) in PDFBox, but that is some future feature. Patches
>are welcome.
>
>> 2) Besides, as you see thumbnail images are loaded in separate thread. While
>> this thread is running and I try to
>> get big image for main content using   BufferedImage
>> bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the
>> following exception:
>> 
>> java.io.IOException: java.util.zip.DataFormatException: unknown compression
>> method
>>     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83)
>>     at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422)
>>     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398)
>>     at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335)
>>     at
>> org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265)
>>     at
>> org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239)
>>     at org.apache.pdfbox.pdfparser.BaseParser.<init>(BaseParser.java:146)
>>     at
>> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:78)
>>     at
>> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451)
>>     at
>> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438)
>>     at
>> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
>>     at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180)
>>     at
>> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205)
>>     at
>> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136)
>>     at
>> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95)
>>   ....
>>     at javafx.concurrent.Task$TaskCallable.call(Task.java:1423)
>>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>     at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.util.zip.DataFormatException: unknown compression method
>>     at java.util.zip.Inflater.inflateBytes(Native Method)
>>     at java.util.zip.Inflater.inflate(Inflater.java:259)
>>     at java.util.zip.Inflater.inflate(Inflater.java:280)
>>     at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101)
>>     at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74)
>>     ... 20 more
>> 
>> How to solve these problems?
>PDFBox isn't supposed to be thread safe.
>
>> 
>> 
>> Среда,  1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler < andr...@lehmi.de >:
>> >
>> >
>> >> Alex Sviridov <  ooo_satu...@mail.ru > hat am 1. Juli 2015 um 13:09
>> >> geschrieben:
>> >> 
>> >> 
>> >>  I decided to show all the code. I also send the pdf file - some file from
>> >> internet I use for testing.
>> >The attachment didn't make it due to some restrictions to the mailing list.
>> >Please post a link to the origin source or another place where we can
>> >download
>> >the pdf in question.
>> >
>> >> 
>> >> Task task = new Task() {
>> >>     @Override protected Integer call() throws Exception {
>> >>         for (int i=0;i<model.getTotalPages();i++){
>> >>             System.out.println("Point a:"+i);
>> >>             WritableImage writableImage=model.getPageThumbImage(i);
>> >>             System.out.println("Point b:"+i);
>> >>             ImageView imageView=new ImageView(writableImage);
>> >>             System.out.println("Point c:"+i);
>> >>             Label label=new Label(Integer.toString(i+1));
>> >>             System.out.println("Point d:"+i);
>> >>             VBox vBox=new VBox(imageView,label);
>> >>             System.out.println("Point e:"+i);
>> >>             vBox.setAlignment(Pos.CENTER);
>> >>             vBox.setStyle("-fx-padding:5px 5px 5px
>> >> 5px;-fx-background-color:red");
>> >>             System.out.println("Point f:"+i);
>> >>             Platform.runLater(new Runnable() {
>> >>                 @Override
>> >>                 public void run() {
>> >>                      thumbFlowPane.getChildren().add(vBox);
>> >>                 }
>> >>             });
>> >>         }
>> >>         return null;
>> >>     }
>> >> };
>> >> new Thread(task).start();
>> >> 
>> >> And here is the tail of the output
>> >> ....
>> >> Point a:30
>> >> Point b:30
>> >> Point c:30
>> >> Point d:30
>> >> Point e:30
>> >> Point f:30
>> >> Point a:31
>> >> 
>> >> What is scratch file? Sorry, I don't understand you.
>> >
>> >PDFBox holds a lot of temporary data in the memory. To reduce the memory
>> >footprint one can choose to use a scratch file instead, so that some/most of
>> >that data will be hold in a file.
>> >
>> >To do so, simply use another load method, e.g. 
>> >
>> >load(File file, boolean useScratchFiles)
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> Среда,  1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler <  
>> >> andr...@lehmi.de
>> >> >:
>> >> >
>> >> >
>> >> >> Alex Sviridov <  ooo_satu...@mail.ru > hat am 1. Juli 2015 um 12:58
>> >> >> geschrieben:
>> >> >> 
>> >> >> 
>> >> >>  Thank you for answer. I tried 
>> >> >> pdfbox-app-2.0.0-20150630.220424-1464.jar
>> >> >> the
>> >> >> result is the same.
>> >> >> 
>> >> >> When I create images I add them to javafx FlowPane. However, the 
>> >> >> problem
>> >> >> is
>> >> >> not in images because I repeat - I get 400mb when I do
>> >> >> pdfDocument=null,pdfRenderer=null.
>> >> >> 
>> >> >> Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I
>> >> >> don't
>> >> >> have any problems with memory. 
>> >> >> 
>> >> >> I'm getting problem with memory when I run in for loop
>> >> >> getPageThumbImage.
>> >> >> 
>> >> >> I am sure that the problem is in PdfBox. Please, help me.
>> >> >Maybe, but I'm not sure at all.
>> >> >
>> >> >Try to use the scratch file.
>> >> >
>> >> >> Среда,  1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler <
>> >> >>  andr...@lehmi.de
>> >> >> >:
>> >> >> >
>> >> >> >
>> >> >> >> Alex Sviridov <  ooo_satu...@mail.ru > hat am 1. Juli 2015 um 10:16
>> >> >> >> geschrieben:
>> >> >> >> 
>> >> >> >> 
>> >> >> >>  I want to display all page thumbnails. However I came across memory
>> >> >> >> size
>> >> >> >> problem with PDFRenderer or PDDocument - I don't know which one. 
>> >> >> >> 
>> >> >> >> I have the following code:
>> >> >> >>    ....
>> >> >> >>     private PDDocument pdfDocument;
>> >> >> >>     
>> >> >> >>     private PDFRenderer pdfRenderer;
>> >> >> >> 
>> >> >> >>     public WritableImage getPageThumbImage(int page){
>> >> >> >>         WritableImage result=null;
>> >> >> >>         try {
>> >> >> >>             BufferedImage bi=pdfRenderer.renderImageWithDPI(page, 
>> >> >> >> 12,
>> >> >> >> ImageType.RGB);
>> >> >> >>             result=SwingFXUtils.toFXImage(bi, null);
>> >> >> >>         } catch (IOException ex) {
>> >> >> >>              ....
>> >> >> >>         }
>> >> >> >>         return result;
>> >> >> >>     }
>> >> >> >>  .....
>> >> >> >> The method getPageThumbImage I run in for loop for every page.I set
>> >> >> >> java
>> >> >> >> memory heap to 500mb. 
>> >> >> >> And I can get about 30 images using getPageThumbImage (if I set more
>> >> >> >> memory
>> >> >> >> I
>> >> >> >> get more). 
>> >> >> >> In my application I have real time memory graphs and they show that
>> >> >> >> memory
>> >> >> >> is
>> >> >> >> very fast filled. 
>> >> >> >> When there is no more free memory getPageThumbImage hangs - no
>> >> >> >> exception,
>> >> >> >> nothing. But the code stops.
>> >> >> >> When I do pdfDocument=null,pdfRenderer=null I get about 400mb free
>> >> >> >> memory.
>> >> >> >> How
>> >> >> >> to solve this problem?
>> >> >> >There are 2 possible issues and maybe both are relevant.
>> >> >> >
>> >> >> >1. PDFBox consumes more or less memory to load a pdf depending on the
>> >> >> >size
>> >> >> >and
>> >> >> >the content of the pdf.
>> >> >> >
>> >> >> >- Are you using the latest 2.0.0-SNAPSHOT? There were some 
>> >> >> >improvements
>> >> >> >concerning the memory footprint lately
>> >> >> >- Try to use of a scratch file (there are load methods including a
>> >> >> >boolean
>> >> >> >switcht ot activate that)
>> >> >> >
>> >> >> >2. Your own implementation consumes more or less memory to process
>> >> >> >those
>> >> >> >thumbnails
>> >> >> >
>> >> >> >- check if you are releasing all resources (ecspecially those images
>> >> >> >you're
>> >> >> >creating) you are using during your process
>> >> >> >
>> >> >> >HTH,
>> >> >> >Andreas
>> >> >> >
>> >> >> >---------------------------------------------------------------------
>> >> >> >To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
>> >> >> >For additional commands, e-mail:  users-h...@pdfbox.apache.org
>> >> >> >
>> >> >> 
>> >> >> 
>> >> >> -- 
>> >> >> Alex Sviridov
>> >> >
>> >> >BR
>> >> >Andreas
>> >> >
>> >> >---------------------------------------------------------------------
>> >> >To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
>> >> >For additional commands, e-mail:  users-h...@pdfbox.apache.org
>> >> >
>> >> 
>> >> 
>> >> -- 
>> >> Alex Sviridov
>> >> 
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
>> >> For additional commands, e-mail:  users-h...@pdfbox.apache.org
>> >
>> >
>> >BR
>> >Andreas
>> >
>> >---------------------------------------------------------------------
>> >To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
>> >For additional commands, e-mail:  users-h...@pdfbox.apache.org
>> >
>> 
>> 
>> -- 
>> Alex Sviridov
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail:  users-unsubscr...@pdfbox.apache.org
>For additional commands, e-mail:  users-h...@pdfbox.apache.org
>


-- 
Alex Sviridov

Re[8]: PDFRenderer, PDDocument memory issue

Reply via email to