[ https://issues.apache.org/jira/browse/TIKA-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208582#comment-13208582 ]
Jan Riewe commented on TIKA-856: -------------------------------- Maybe this is helpful: http://code.google.com/p/language-detection/wiki/Tools a tool for generating ngram profiles by wikipedia entries > Support CJK (Chinese, Japanese and Korean) language detection > ------------------------------------------------------------- > > Key: TIKA-856 > URL: https://issues.apache.org/jira/browse/TIKA-856 > Project: Tika > Issue Type: New Feature > Components: languageidentifier > Affects Versions: 1.0 > Environment: All > Reporter: James Sullivan > Labels: Chinese, Japanese > > Support language detection of CJK (Chinese, Japanese and Korean). > Some estimates have Chinese users overtaking English users on the Internet > so it is important that these languages used by large number of people be > supported. > See TIKA-855 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira