Hi, As far as I remember, ICEpdf didn't render right to left languages correctly. I'm not sure thou, maybe it is fixed now.
-Hamed On Wed, Apr 4, 2012 at 11:48 AM, Nicklas Karlsson <[email protected]>wrote: > Thanks for the information. I continued my search for libraries and > stumbled on ICEpdf from ICEsoft and it works there so you could check for > hints in their source code while improving on PDFBox ;-) > > On Wed, Apr 4, 2012 at 9:57 AM, Hamed Iravanchi <[email protected]> > wrote: > > > Hi Nicklas, > > > > I've been working on this issue for a while. > > Right now, PDFBox can not convert PDF files created by Open Office or > Libre > > Office to images correctly. > > In my tests, PDF files created by Microsoft Word do not have this problem > > in the latest Trunk code. > > > > This is due to using extracted text to render the image, rather than > using > > code points. > > Andreas used to reply my emails so we could collaborate and resolve such > > issues faster, but I haven't received any reply lately. > > I don't know if I'm posting in the right place or not thou... > > > > Anyway, to fix this issue for True Type fonts (which are typically used > in > > your case) following things should be done by PDFBox: > > - It should use code points for all true type fonts, instead of extracted > > text > > - The code points should be mapped to glyph codes using the font's CMAP > > - Glyph codes should be used to draw text on the image. > > > > I just managed to fix this yesterday in my code for my sample PDF files, > by > > modifying the trunk code. > > But I'm waiting for developer team to collaborate so that I can make sure > > what I'm doing is right and doesn't break other parts in PDFBox. > > > > -Hamed > > > > > > On Wed, Mar 28, 2012 at 11:15 AM, Nicklas Karlsson <[email protected] > > >wrote: > > > > > Hi, > > > > > > I'm using the latest LibreOffice to produce a PDF and the latest > PDFBox > > > to extract the pages as images but I'm having some problems with the > > fonts. > > > If I use Times New Roman I get a > > > > > > org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString > > > Changing font on <test> from <Times New Roman> to the default font > > > > > > If I embed some more exotic fonts in the PDF, I get a > > > > > > org.apache.pdfbox.util.PDFStreamEngine processOperator > > > unsupported/disabled operation: BMC > > > org.apache.pdfbox.util.PDFStreamEngine processOperator > > > unsupported/disabled operation: EMC > > > org.apache.pdfbox.util.PDFStreamEngine processOperator > > > unsupported/disabled operation: BDC > > > org.apache.pdfbox.pdmodel.font.PDSimpleFont drawString > > > Changing font on <test> from <Algerian> to the default font > > > > > > This is all on the same machine. Is there a special trick in getting > the > > > fonts working? > > > > > > The extraction is done with something like > > > > > > PDDocument doc = PDDocument.load(pdf); > > > List pages = doc.getDocumentCatalog().getAllPages(); > > > for (int i = 0; i < pages.size(); i++) > > > { > > > PDPage page = (PDPage) pages.get(i); > > > pics.add(page.convertToImage()); > > > } > > > > > > > > > Thanks in advance, > > > Nik > > > > > > -- > > > --- > > > Nik > > > > > > > > > -- > --- > Nik >

