[jira] [Updated] (LUCENE-7393) Incorrect ICUTokenization on South East Asian Language

2016-07-24 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-7393: Attachment: LUCENE-7393.patch Here is a patch restoring the previous rule-based algorithm as an opt

[jira] [Updated] (LUCENE-7393) Incorrect ICUTokenization on South East Asian Language

2016-07-24 Thread AM (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] AM updated LUCENE-7393: --- Description: Lucene 4.10.3 correctly tokenize a syllable into one token. However in Lucune 5.5.0 it end up being tw