Language Detection not working for Japanese and Chinese. --------------------------------------------------------
Key: TIKA-855 URL: https://issues.apache.org/jira/browse/TIKA-855 Project: Tika Issue Type: Bug Components: languageidentifier Affects Versions: 1.0 Environment: Windows XP, Vista and Linux Ubuntu 11.10 using Sun Java 6 and Oracle Java 7 Reporter: James Sullivan Priority: Minor I have tried Tika 1.0 language detection (java -jar tika.jar -l .\Japanese.txt) on several Japanese files (both PDF and text files) and it consistently returns lt (Lithuanian???) instead of ja. I also tried on a Chinese file which similarly incorrectly returned lt. Both English language and French language detection worked correctly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira