Question mark in the extracted text

Christian Mewes Thu, 04 Feb 2010 08:34:36 -0800

Hi,

I'm using the text extraction of the Apache PDFBox 0.8.0 library.

Unfortunately, the text extraction is replacing some signs and lettersby '?'.

The PDF-File contains German language. I have extracted the text withthe ExtractText.java example from the PDFBox package.


Here is an example:
input text:

"Front: Weiß Hochglanz, Korpus: Noce Dekor,Griff: Metall chrom glänzend, B ca. 234 cm 4425678 394.-**Hochschrank B/H/T ca. 35/179/29 cm 10060786 175,-** "

pdfbox output text:

"Front: Weiß Hochglanz, Korpus: Noce Dekor,Griff: Metall chrom gl?nzend, ? ca? ??? cm ???????? 394.-**

H?hschrank  ?H? ca??????cm  ???? ??- **"

I would be please, if you could help me with that problem and suggestsome possibilities to make it work.


Cheers,
Christian

Question mark in the extracted text

Reply via email to