WordDelimiterFilter do wrong word breaking for Thai vowel
---------------------------------------------------------

                 Key: SOLR-1078
                 URL: https://issues.apache.org/jira/browse/SOLR-1078
             Project: Solr
          Issue Type: Bug
          Components: Analysis
    Affects Versions: 1.4
         Environment: Ubuntu 8.10 64bit
Java 1.6.0_10
            Reporter: SIriwat Aumngamsup


With any configuration of schema.xml
{code:xml}<filter class="solr.WordDelimiterFilterFactory" />{code}

will do wrong word breaking with Thai characters.
----
Example: "ผู้ ใหญ่ บ้าน"

Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน"

Expect result: 0 => "ผู้", 1 => "ใหญ่", 2 => "บ้าน"
----
Example2: "ผู้ใหญ่บ้าน" (no space)

Wrong result: 0 => "ผ", 1 => "ใหญ", 2 => "บ", 3 => "าน" (same result)

Expect result: 0 => "ผู้ใหญ่บ้าน"

----

There's a similar problem with Drupal (http://drupal.org/node/335928)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to