Hi Nicklas, I've been working on this issue for a while. Right now, PDFBox can not convert PDF files created by Open Office or Libre Office to images correctly. In my tests, PDF files created by Microsoft Word do not have this problem in the latest Trunk code.
This is due to using extracted text to render the image, rather than using code points. Andreas used to reply my emails so we could collaborate and resolve such issues faster, but I haven't received any reply lately. I don't know if I'm posting in the right place or not thou... Anyway, to fix this issue for True Type fonts (which are typically used in your case) following things should be done by PDFBox: - It should use code points for all true type fonts, instead of extracted text - The code points should be mapped to glyph codes using the font's CMAP - Glyph codes should be used to draw text on the image. I just managed to fix this yesterday in my code for my sample PDF files, by modifying the trunk code. But I'm waiting for developer team to collaborate so that I can make sure what I'm doing is right and doesn't break other parts in PDFBox. -Hamed On Wed, Mar 28, 2012 at 11:15 AM, Nicklas Karlsson <[email protected]>wrote: > Hi, > > I'm using the latest LibreOffice to produce a PDF and the latest PDFBox > to extract the pages as images but I'm having some problems with the fonts. > If I use Times New Roman I get a > > org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString > Changing font on <test> from <Times New Roman> to the default font > > If I embed some more exotic fonts in the PDF, I get a > > org.apache.pdfbox.util.PDFStreamEngine processOperator > unsupported/disabled operation: BMC > org.apache.pdfbox.util.PDFStreamEngine processOperator > unsupported/disabled operation: EMC > org.apache.pdfbox.util.PDFStreamEngine processOperator > unsupported/disabled operation: BDC > org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString > Changing font on <test> from <Algerian> to the default font > > This is all on the same machine. Is there a special trick in getting the > fonts working? > > The extraction is done with something like > > PDDocument doc = PDDocument.load(pdf); > List pages = doc.getDocumentCatalog().getAllPages(); > for (int i = 0; i < pages.size(); i++) > { > PDPage page = (PDPage) pages.get(i); > pics.add(page.convertToImage()); > } > > > Thanks in advance, > Nik > > -- > --- > Nik >

