Windings font recognition in Tika parsing + spacing issue
---------------------------------------------------------

                 Key: TIKA-331
                 URL: https://issues.apache.org/jira/browse/TIKA-331
             Project: Tika
          Issue Type: Wish
          Components: parser
    Affects Versions: 0.4
         Environment: Windows XP / Java JDK 1.6.0_15
            Reporter: MRIT64


I have PDF files that include some characters in Windings font.
Tika parser replaces them with some Unicode characters that have nothing to do 
with the original, and, in some cases, replaces them with alphabetic characters 
(that is normal regarding these characters codes).
Would it be possible to improve the parsing and remplace these characters with 
more accurate Unicode characters ?
(see http://www.alanwood.net/demos/wingdings.html for possible correspondences).

I will attach examples files when this issue will be created  (would it be 
possible to attach files directly when creating issues ?)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to