Hi,
I'm using the text extraction of the Apache PDFBox 0.8.0 library.
Unfortunately, the text extraction is replacing some signs and letters
by '?'.
The PDF-File contains German language. I have extracted the text with
the ExtractText.java example from the PDFBox package.
Here is an example:
input text:
"Front: Weiß Hochglanz, Korpus: Noce Dekor,
Griff: Metall chrom glänzend, B ca. 234 cm 4425678 394.-**
Hochschrank B/H/T ca. 35/179/29 cm 10060786 175,-** "
pdfbox output text:
"Front: Weiß Hochglanz, Korpus: Noce Dekor,
Griff: Metall chrom gl?nzend, ? ca? ??? cm ???????? 394.-**
H?hschrank ?H? ca??????cm ???? ??- **"
I would be please, if you could help me with that problem and suggest
some possibilities to make it work.
Cheers,
Christian