Hello. I want to extract text from pages and when I try to write it into a new 
PDF, some characters are mixed up.
I extract the text using the TextPosition objects that contain the actual text 
strings, font, position etc.

This is the important code that I use to write the text into the page:
contentStream is a PDPageContentStream, te is a TextPosition,
page is a PDPage


contentStream.setFont(te.getFont(), te.getFontSizeInPt());
                                contentStream.setTextMatrix(1, 0, 0, 1, 
te.getXDirAdj(), page.getArtBox().getHeight()-te.getYDirAdj());
                                contentStream.drawString(te.getCharacter());

It works for normal text, however there are problems with mathematical terms, 
see the attachment please.
The out.png has the converted page using pdftoimage; everything went fine 
except that the sigma sign is missing. myresult.pdf on the other hand has lots 
of font problems: nearly every special character is the root sign and if it 
isn't the root sign, it's some other mixed character.
If you want to take a look at the original pdf, it's
http://www.xs4all.nl/~johanw/math.pdf page 16.
-- 
Sicherer, schneller und einfacher. Die aktuellen Internet-Browser -
jetzt kostenlos herunterladen! http://portal.gmx.net/de/go/atbrowser

Reply via email to