Hello. I want to extract text from pages and when I try to write it into a new
PDF, some characters are mixed up.
I extract the text using the TextPosition objects that contain the actual text
strings, font, position etc.
This is the important code that I use to write the text into the page:
contentStream is a PDPageContentStream, te is a TextPosition,
page is a PDPage
contentStream.setFont(te.getFont(), te.getFontSizeInPt());
contentStream.setTextMatrix(1, 0, 0, 1,
te.getXDirAdj(), page.getArtBox().getHeight()-te.getYDirAdj());
contentStream.drawString(te.getCharacter());
It works for normal text, however there are problems with mathematical terms,
see the attachment please.
The out.png has the converted page using pdftoimage; everything went fine
except that the sigma sign is missing. myresult.pdf on the other hand has lots
of font problems: nearly every special character is the root sign and if it
isn't the root sign, it's some other mixed character.
If you want to take a look at the original pdf, it's
http://www.xs4all.nl/~johanw/math.pdf page 16.
--
Sicherer, schneller und einfacher. Die aktuellen Internet-Browser -
jetzt kostenlos herunterladen! http://portal.gmx.net/de/go/atbrowser