I get this a lot with "obscure" fonts - I would love to improve the font
handling
but worry that the project is not well controlled and any effort in this
direction
would be wasted.
Who is producing 1.0.0 and WHEN ???????????
iaincc
Villu Ruusmann wrote:
Hello there,
I'm using the text extraction of the Apache PDFBox 0.8.0 library.
Unfortunately, the text extraction is replacing some signs and letters by
'?'.
Without having seen the PDF file, I guess that the problem is that the
"faulty" characters depend on a font which is not properly supported
by PDFBox 0.8.0 (the translation rules from bytes to character codes
could be embedded into the font program; PDFBox does not know yet how
to parse/interpret all types of font programs, so it bails out with a
"?" instead).
Hopefully the upcoming PDFBox 1.0.0 release is a bit more savvy in this regard.
VR