Already answered in
https://stackoverflow.com/questions/31097691/extra-symbols-when-converting-pdf-to-image-with-pdfbox
feel free to ask additional questions about the 2.0 version. See also
https://pdfbox.apache.org/downloads.html#scm
and
https://pdfbox.apache.org/2.0/getting-started.html
Tilman
Am 28.06.2015 um 10:49 schrieb Александр Свиридов:
I use apache pfdbox 1.8.9. I have one page pdf file which contains text and I
want to convert this page to image. This pdf file I did via Libre Office. I use
the following code:
PDDocument document =PDDocument.loadNonSeq(newFile(filename),null);
List<PDPage> pdPages = document.getDocumentCatalog().getAllPages();
int page =0;for(PDPage pdPage : pdPages){
++page;
BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB,300);
ImageIOUtil.writeImage(bim,"png","/home/file"+"-"+ page,300);
}
document.close();
The code works, I get png image. The problem is that there are a lot of strange
extra symbols which make it very difficult to read the text. How to fix it?
The image is here http://i.stack.imgur.com/OUyLO.png
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]