WordDelimiterFilter do wrong word breaking for Thai vowel ---------------------------------------------------------
Key: SOLR-1078 URL: https://issues.apache.org/jira/browse/SOLR-1078 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.4 Environment: Ubuntu 8.10 64bit Java 1.6.0_10 Reporter: SIriwat Aumngamsup With any configuration of schema.xml {code:xml}<filter class="solr.WordDelimiterFilterFactory" />{code} will do wrong word breaking with Thai characters. ---- Example: "ผู้ ใหญ่ บ้าน" Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน" Expect result: 0 => "ผู้", 1 => "ใหญ่", 2 => "บ้าน" ---- Example2: "ผู้ใหญ่บ้าน" (no space) Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน" (same result) Expect result: 0 => "ผู้ใหญ่บ้าน" ---- There's a similar problem with Drupal (http://drupal.org/node/335928) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.