[ https://issues.apache.org/jira/browse/SOLR-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707511#action_12707511 ]
Robert Muir commented on SOLR-1078: ----------------------------------- looks pretty good... i was concerned about the split on case-change behavior breaking with the obvious fix. i think you want to include MODIFIER_SYMBOL tho. > WordDelimiterFilter do wrong word breaking for Thai vowel > --------------------------------------------------------- > > Key: SOLR-1078 > URL: https://issues.apache.org/jira/browse/SOLR-1078 > Project: Solr > Issue Type: Bug > Components: Analysis > Affects Versions: 1.4 > Environment: Ubuntu 8.10 64bit > Java 1.6.0_10 > Reporter: SIriwat Aumngamsup > Fix For: 1.4 > > Attachments: SOLR-1078.patch > > > With any configuration of schema.xml > {code:xml}<filter class="solr.WordDelimiterFilterFactory" />{code} > will do wrong word breaking with Thai characters. > ---- > Example: "ผู้ ใหญ่ บ้าน" > Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน" > Expect result: 0 => "ผู้", 1 => "ใหญ่", 2 => "บ้าน" > ---- > Example2: "ผู้ใหญ่บ้าน" (no space) > Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน" (same result) > Expect result: 0 => "ผู้ใหญ่บ้าน" > ---- > There's a similar problem with Drupal (http://drupal.org/node/335928) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.