Ok. Thank you very much for explanation. Could you say where this scratch file is located linux/windows?
Среда, 1 июля 2015, 13:54 +02:00 от Andreas Lehmkühler <andr...@lehmi.de>: >> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 13:38 >> geschrieben: >> >> >> The file is here https://yadi.sk/i/Y0fTuvHmhbZiE >Ah, that explains a lot. The pdf is a scanned document, every page holds a >color >image, consuming a lot of memory when processed > >> I tried with load (fileName,true). The result - now I don't have memory >> problems. However now I have 2 problems: >> >> 1) All the thumbnail images are loaded. However, the speed is VERY SLOW. One >> thumbnail image is loaded about 4 seconds! >If it comes to huge pdfs, you have to die one death. Either you provide enough >memory to do all the stuff in memory (fast) or you use a scratch file to save >memory (slow) > >And yes, there is room for an improvement of the memory handling (read on >demand, remove after usage) in PDFBox, but that is some future feature. Patches >are welcome. > >> 2) Besides, as you see thumbnail images are loaded in separate thread. While >> this thread is running and I try to >> get big image for main content using BufferedImage >> bi=pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB); I get the >> following exception: >> >> java.io.IOException: java.util.zip.DataFormatException: unknown compression >> method >> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:83) >> at org.apache.pdfbox.cos.COSStream.attemptDecode(COSStream.java:422) >> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:398) >> at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:335) >> at >> org.apache.pdfbox.cos.COSStream.checkUnfilteredBuffer(COSStream.java:265) >> at >> org.apache.pdfbox.cos.COSStream.getUnfilteredRandomAccess(COSStream.java:239) >> at org.apache.pdfbox.pdfparser.BaseParser.<init>(BaseParser.java:146) >> at >> org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:78) >> at >> org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:451) >> at >> org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:438) >> at >> org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149) >> at org.apache.pdfbox.rendering.PageDrawer.drawPage(PageDrawer.java:180) >> at >> org.apache.pdfbox.rendering.PDFRenderer.renderPage(PDFRenderer.java:205) >> at >> org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:136) >> at >> org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:95) >> .... >> at javafx.concurrent.Task$TaskCallable.call(Task.java:1423) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.util.zip.DataFormatException: unknown compression method >> at java.util.zip.Inflater.inflateBytes(Native Method) >> at java.util.zip.Inflater.inflate(Inflater.java:259) >> at java.util.zip.Inflater.inflate(Inflater.java:280) >> at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:101) >> at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:74) >> ... 20 more >> >> How to solve these problems? >PDFBox isn't supposed to be thread safe. > >> >> >> Среда, 1 июля 2015, 13:17 +02:00 от Andreas Lehmkühler < andr...@lehmi.de >: >> > >> > >> >> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 13:09 >> >> geschrieben: >> >> >> >> >> >> I decided to show all the code. I also send the pdf file - some file from >> >> internet I use for testing. >> >The attachment didn't make it due to some restrictions to the mailing list. >> >Please post a link to the origin source or another place where we can >> >download >> >the pdf in question. >> > >> >> >> >> Task task = new Task() { >> >> @Override protected Integer call() throws Exception { >> >> for (int i=0;i<model.getTotalPages();i++){ >> >> System.out.println("Point a:"+i); >> >> WritableImage writableImage=model.getPageThumbImage(i); >> >> System.out.println("Point b:"+i); >> >> ImageView imageView=new ImageView(writableImage); >> >> System.out.println("Point c:"+i); >> >> Label label=new Label(Integer.toString(i+1)); >> >> System.out.println("Point d:"+i); >> >> VBox vBox=new VBox(imageView,label); >> >> System.out.println("Point e:"+i); >> >> vBox.setAlignment(Pos.CENTER); >> >> vBox.setStyle("-fx-padding:5px 5px 5px >> >> 5px;-fx-background-color:red"); >> >> System.out.println("Point f:"+i); >> >> Platform.runLater(new Runnable() { >> >> @Override >> >> public void run() { >> >> thumbFlowPane.getChildren().add(vBox); >> >> } >> >> }); >> >> } >> >> return null; >> >> } >> >> }; >> >> new Thread(task).start(); >> >> >> >> And here is the tail of the output >> >> .... >> >> Point a:30 >> >> Point b:30 >> >> Point c:30 >> >> Point d:30 >> >> Point e:30 >> >> Point f:30 >> >> Point a:31 >> >> >> >> What is scratch file? Sorry, I don't understand you. >> > >> >PDFBox holds a lot of temporary data in the memory. To reduce the memory >> >footprint one can choose to use a scratch file instead, so that some/most of >> >that data will be hold in a file. >> > >> >To do so, simply use another load method, e.g. >> > >> >load(File file, boolean useScratchFiles) >> >> >> >> >> >> >> >> >> >> >> >> >> >> Среда, 1 июля 2015, 13:04 +02:00 от Andreas Lehmkühler < >> >> andr...@lehmi.de >> >> >: >> >> > >> >> > >> >> >> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 12:58 >> >> >> geschrieben: >> >> >> >> >> >> >> >> >> Thank you for answer. I tried >> >> >> pdfbox-app-2.0.0-20150630.220424-1464.jar >> >> >> the >> >> >> result is the same. >> >> >> >> >> >> When I create images I add them to javafx FlowPane. However, the >> >> >> problem >> >> >> is >> >> >> not in images because I repeat - I get 400mb when I do >> >> >> pdfDocument=null,pdfRenderer=null. >> >> >> >> >> >> Bedised, when I do pdfDocument = PDDocument.load(new File(fileName)) I >> >> >> don't >> >> >> have any problems with memory. >> >> >> >> >> >> I'm getting problem with memory when I run in for loop >> >> >> getPageThumbImage. >> >> >> >> >> >> I am sure that the problem is in PdfBox. Please, help me. >> >> >Maybe, but I'm not sure at all. >> >> > >> >> >Try to use the scratch file. >> >> > >> >> >> Среда, 1 июля 2015, 12:48 +02:00 от Andreas Lehmkühler < >> >> >> andr...@lehmi.de >> >> >> >: >> >> >> > >> >> >> > >> >> >> >> Alex Sviridov < ooo_satu...@mail.ru > hat am 1. Juli 2015 um 10:16 >> >> >> >> geschrieben: >> >> >> >> >> >> >> >> >> >> >> >> I want to display all page thumbnails. However I came across memory >> >> >> >> size >> >> >> >> problem with PDFRenderer or PDDocument - I don't know which one. >> >> >> >> >> >> >> >> I have the following code: >> >> >> >> .... >> >> >> >> private PDDocument pdfDocument; >> >> >> >> >> >> >> >> private PDFRenderer pdfRenderer; >> >> >> >> >> >> >> >> public WritableImage getPageThumbImage(int page){ >> >> >> >> WritableImage result=null; >> >> >> >> try { >> >> >> >> BufferedImage bi=pdfRenderer.renderImageWithDPI(page, >> >> >> >> 12, >> >> >> >> ImageType.RGB); >> >> >> >> result=SwingFXUtils.toFXImage(bi, null); >> >> >> >> } catch (IOException ex) { >> >> >> >> .... >> >> >> >> } >> >> >> >> return result; >> >> >> >> } >> >> >> >> ..... >> >> >> >> The method getPageThumbImage I run in for loop for every page.I set >> >> >> >> java >> >> >> >> memory heap to 500mb. >> >> >> >> And I can get about 30 images using getPageThumbImage (if I set more >> >> >> >> memory >> >> >> >> I >> >> >> >> get more). >> >> >> >> In my application I have real time memory graphs and they show that >> >> >> >> memory >> >> >> >> is >> >> >> >> very fast filled. >> >> >> >> When there is no more free memory getPageThumbImage hangs - no >> >> >> >> exception, >> >> >> >> nothing. But the code stops. >> >> >> >> When I do pdfDocument=null,pdfRenderer=null I get about 400mb free >> >> >> >> memory. >> >> >> >> How >> >> >> >> to solve this problem? >> >> >> >There are 2 possible issues and maybe both are relevant. >> >> >> > >> >> >> >1. PDFBox consumes more or less memory to load a pdf depending on the >> >> >> >size >> >> >> >and >> >> >> >the content of the pdf. >> >> >> > >> >> >> >- Are you using the latest 2.0.0-SNAPSHOT? There were some >> >> >> >improvements >> >> >> >concerning the memory footprint lately >> >> >> >- Try to use of a scratch file (there are load methods including a >> >> >> >boolean >> >> >> >switcht ot activate that) >> >> >> > >> >> >> >2. Your own implementation consumes more or less memory to process >> >> >> >those >> >> >> >thumbnails >> >> >> > >> >> >> >- check if you are releasing all resources (ecspecially those images >> >> >> >you're >> >> >> >creating) you are using during your process >> >> >> > >> >> >> >HTH, >> >> >> >Andreas >> >> >> > >> >> >> >--------------------------------------------------------------------- >> >> >> >To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> >> >> >For additional commands, e-mail: users-h...@pdfbox.apache.org >> >> >> > >> >> >> >> >> >> >> >> >> -- >> >> >> Alex Sviridov >> >> > >> >> >BR >> >> >Andreas >> >> > >> >> >--------------------------------------------------------------------- >> >> >To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> >> >For additional commands, e-mail: users-h...@pdfbox.apache.org >> >> > >> >> >> >> >> >> -- >> >> Alex Sviridov >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> >> For additional commands, e-mail: users-h...@pdfbox.apache.org >> > >> > >> >BR >> >Andreas >> > >> >--------------------------------------------------------------------- >> >To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >> >For additional commands, e-mail: users-h...@pdfbox.apache.org >> > >> >> >> -- >> Alex Sviridov > >--------------------------------------------------------------------- >To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org >For additional commands, e-mail: users-h...@pdfbox.apache.org > -- Alex Sviridov