Language Detection not working for Japanese and Chinese.
--------------------------------------------------------

                 Key: TIKA-855
                 URL: https://issues.apache.org/jira/browse/TIKA-855
             Project: Tika
          Issue Type: Bug
          Components: languageidentifier
    Affects Versions: 1.0
         Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java 6 
and Oracle Java 7
            Reporter: James Sullivan
            Priority: Minor


I have tried Tika 1.0 language detection (java -jar tika.jar -l .\Japanese.txt) 
on several Japanese files (both PDF and text files) and it consistently returns 
lt (Lithuanian???) instead of ja. I also tried on a Chinese file which 
similarly incorrectly returned lt. Both English language and French language 
detection worked correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to