Hi,
Ian Smith schrieb:
Hi Folks,
I have linked to a PDF (~2MB) that produces unprintable characters in
the extracted text output. These characters seem to be associated with
the first two pages of the document.
http://www.yourphp.org.uk/media/pdf/g/4/Annual_Report_0809.pdf
What do you mean by unprintable?
I believe the problem is caused by at least one of the embedded fonts in
the document; my debugging has shown that the strange characters are
associated with Identity-H encoding and/or Type 1 (CID) fonts and (only
perhaps) also the Mistral Font (KWTOGC+Mistral?). Fonts that display
correctly seem to be associated with the WinAnsi encoding.
The current trunk (version 921494) contains an improvement for Identity-H
encoded text. I've extracted the text with the latest version from your pdf
and got the following result:
Poole Housing Partnership Ltd
Annual Report 08 09
???????????????????????????????????
20889 PHP R&A V1pk.indd 1 14/1/10 10:43:40
Contents
1. Welcome
2. The year in pictures and numbers
6. How we spend your money
8. Financial inclusions
10. Residents’ involvement
12. Tenancy support & disabled adaptations
14. Improving services
16. Customer insight
18. The environment
20. Leaseholders
???????????????????????????????????
20889 PHP R&A V1pk.indd 2 14/1/10 10:43:40
Welcome
They say time flies when you’re busy, and the past year seems to have flown by.
In June I had the pleasure, along with the Council portfolio
holder of being presented with an award for the services
provided by PHP being rated as excellent by the public
sector watchdog, the Audit Commission.
Did you get the same or is it an improvement compared to your output?
BR
Andreas Lehmkühler