Windings font recognition in Tika parsing + spacing issue
---------------------------------------------------------
Key: TIKA-331
URL: https://issues.apache.org/jira/browse/TIKA-331
Project: Tika
Issue Type: Wish
Components: parser
Affects Versions: 0.4
Environment: Windows XP / Java JDK 1.6.0_15
Reporter: MRIT64
I have PDF files that include some characters in Windings font.
Tika parser replaces them with some Unicode characters that have nothing to do
with the original, and, in some cases, replaces them with alphabetic characters
(that is normal regarding these characters codes).
Would it be possible to improve the parsing and remplace these characters with
more accurate Unicode characters ?
(see http://www.alanwood.net/demos/wingdings.html for possible correspondences).
I will attach examples files when this issue will be created (would it be
possible to attach files directly when creating issues ?)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.