[ 
https://issues.apache.org/jira/browse/TIKA-867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Mastarone updated TIKA-867:
--------------------------------

    Attachment: TIKA-867.patch

This issue seems to be a duplicate of TIKA-324, but for Windows.  I've 
submitted a patch that duplicates the fix applied in it.  After the patch is 
applied, it is necessary to use the chcp command in the Windows command prompt. 
With the fonts Lucida Console or Consolas, I'm able to see the correct output 
of "Währung" if I run "chcp 65001" before running Tika; my default code page of 
437 does not produce the correct output, nor does page 850, but 65001 does.  
Running chcp without this patch does not seem to work--I tried multiple code 
pages, including the three aforementioned, without success.
                
> UTF-8 encoding does not work on windows
> ---------------------------------------
>
>                 Key: TIKA-867
>                 URL: https://issues.apache.org/jira/browse/TIKA-867
>             Project: Tika
>          Issue Type: Bug
>          Components: cli
>    Affects Versions: 1.0
>         Environment: Windows 7 Enterprise (Java 1.6.0_31) and MAC OS X 10.7.3 
> (Java 1.6.0_30)
>            Reporter: Wolfgang Außerlechner
>         Attachments: TIKA-867.patch
>
>
> When calling tika as command line tool from within java and parsing the 
> output buffer with UTF-8 (e.g. new String(buffer, 0, len, 
> Charset.forName("UTF-8"));) behaviour on windows is different than on mac os.
> On windows the encoding seems to be wrong (Währung vs. W?hrung). Other tools 
> like exiftool work as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to