[jira] Issue Comment Edited: (SOLR-1571) unicode collation support

Robert Muir (JIRA) Sat, 21 Nov 2009 13:15:06 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781053#action_12781053
 ]


Robert Muir edited comment on SOLR-1571 at 11/21/09 9:13 PM:
-------------------------------------------------------------

Hi, i wonder if anyone has any comments on this.

I know this is an invisible/covert JIRA issue right now :)

especially I am curious if the approach is sound, particularly regarding using 
the ICUCollationFilter instead.
In my opinion, this should be a separate integration, even though it will index 
at a significantly faster speed with much smaller keys.
The reason is that it is not compat with the JDK collation keys, and has 
different properties, such as the fact Collator is thread-safe in the JDK, but 
not thread-safe in ICU.
Because of this, I decided to stick with the JDK impl initially.


      was (Author: rcmuir):
    Hi, i wonder if anyone has any comments on this.

I know this is an invisible/convert JIRA issue right now :)

especially I am curious if the approach is sound, particularly regarding using 
the ICUCollationFilter instead.
In my opinion, this should be a separate integration, even though it will index 
at a significantly faster speed with much smaller keys.
The reason is that it is not compat with the JDK collation keys, and has 
different properties, such as the fact Collator is thread-safe in the JDK, but 
not thread-safe in ICU.
Because of this, I decided to stick with the JDK impl initially.

  
> unicode collation support
> -------------------------
>
>                 Key: SOLR-1571
>                 URL: https://issues.apache.org/jira/browse/SOLR-1571
>             Project: Solr
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Robert Muir
>            Priority: Minor
>         Attachments: SOLR-1571.patch
>
>
> This patch adds support for unicode collation (searching and sorting).
> Unicode collation is helpful in a search engine, for many languages you want 
> things to match or sort differently.
> You might even want to use copyfield and support different sort 
> orders/matching schemes if you need to support multiple languages.
> This is simply a factory for lucene's CollationKeyFilter, which indexes 
> binary collation keys in a special format that preserves binary sort order.
> I've added support for creating a Collator in two ways:
> * system collator from a Locale spec (language + country + variant)
> * tailored collator from custom rules in a text file
> in no way is there an option to use the "default" locale of the jvm, (I 
> consider this a bit dangerous)
> in this patch, it is mandatory to define the locale explicitly for a system 
> collator.
> The required lucene-collation-2.9.1.jar is only 12KB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1571) unicode collation support

Reply via email to