Hi,

when extracting a bunch of PDF documents in several languages I wondered why some special characters in some documents where wrong in the extracted text files.

As it turns out these wrong-decoded PDFs have no or flawed ToUnicode dictionaries. The fonts are TrueTypes and always embedded,,,

Does somebody knows

- at what circumstances PDF with no or incorrect CMaps are created

- how could I work around this problem?
Since I have the TTFs: could I preload them? Otherwise: Could I correct the PDFs by replacing the wrong / adding a correct CMap

Thank you for your help.

Wulf

Reply via email to