> Hmm, for standardization of text fields, collation might be a little
> awkward.

I arrived there after using custom rules for a while (see "RuleBasedCollator" 
on http://wiki.apache.org/solr/UnicodeCollation) and then being told
"For better performance, less memory usage, and support for more locales, you 
can add the analysis-extras contrib and use ICUCollationKeyFilterFactory 
instead." (on the same page under "ICU Collation").

> For your german umlauts, what do you mean by standardize? is this to
> achieve equivalency of e.g. oe to ö in your search terms?

That is the main point, but I might also need the additional normalization of 
combined characters like
o+  ̈ = ö and probably similar constructions for other languages (like 
Hungarian).

> In that case, a simpler approach would be to put
> GermanNormalizationFilterFactory in your chain:
> http://lucene.apache.org/core/4_6_1/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html

I'll see how far I get with this, but from the description
        • 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively.
        • 'ae' and 'oe' are replaced by 'a', and 'o', respectively.
this seems to be too far-reaching a reduction: while the identification "ä=ae" 
is not very serious and rarely misleading, "ä=a" might pack words together that 
shouldn't be, "Äsen" and "Asen" are quite different concepts,

In general, the deprecation of ICUCollationKeyFilterFactory doesn't seem to be 
really thought through.

Thanks anyway, best
Thomas

> 
> On Wed, Feb 19, 2014 at 9:16 AM, Thomas Fischer <fischer...@aon.at> wrote:
> 
>> Thanks, that helps!
>> 
>> I'm trying to migrate from the now deprecated ICUCollationKeyFilterFactory
>> I used before to the ICUCollationField.
>> Is there any description how to achieve this?
>> 
>> First tries now yield
>> 
>> ICUCollationField does not support specifying an analyzer.
>> 
>> which makes it complicated since I used the ICUCollationKeyFilterFactory
>> to standardize my text fields (in particular because of German Umlauts).
>> But an ICUCollationField without LowerCaseFilter, a WhitespaceTokenizer, a
>> LetterTokenizer, etc. doesn't do me much good, I'm afraid.
>> Or is this somehow wrapped into the ICUCollationField?
>> 
>> I didn't find ICUCollationField  in the solr wiki and not much information
>> in the reference.
>> And the hint
>> 
>> "solr.ICUCollationField is included in the Solr analysis-extras contrib -
>> see solr/contrib/analysis-extras/README.txt for instructions on which jars
>> you need to add to your SOLR_HOME/lib in order to use it."
>> 
>> is misleading insofar as this README.txt doesn't mention the
>> solr-analysis-extras-4.6.1.jar in dist.
>> 
>> Best
>> Thomas
>> 
>> 
>> Am 19.02.2014 um 14:27 schrieb Robert Muir:
>> 
>>> you need the solr analysis-extras jar itself, too.
>>> 
>>> 
>>> 
>>> On Wed, Feb 19, 2014 at 8:25 AM, Thomas Fischer <fischer...@aon.at>
>> wrote:
>>> 
>>>> Hello Robert,
>>>> 
>>>> I already added
>>>> contrib/analysis-extras/lib/
>>>> and
>>>> contrib/analysis-extras/lucene-libs/
>>>> via lib directives in solrconfig, this is why the classes mentioned are
>>>> loaded.
>>>> 
>>>> Do you know which jar is supposed to contain the ICUCollationField?
>>>> 
>>>> Best regards
>>>> Thomas
>>>> 
>>>> 
>>>> 
>>>> Am 19.02.2014 um 13:54 schrieb Robert Muir:
>>>> 
>>>>> you need the solr analysis-extras jar in your classpath, too.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Feb 19, 2014 at 6:45 AM, Thomas Fischer <fischer...@aon.at>
>>>> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> I'm migrating to solr 4.6.1 and have problems with the
>> ICUCollationField
>>>>>> (apache-solr-ref-guide-4.6.pdf, pp. 31 and 100).
>>>>>> 
>>>>>> I get consistently the error message
>>>>>> Error loading class 'solr.ICUCollationField'.
>>>>>> even after
>>>>>> INFO: Adding
>>>>>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lib/icu4j-49.1.jar' to
>>>>>> classloader
>>>>>> and
>>>>>> INFO: Adding
>>>>>> 
>>>> 
>> 'file:/srv/solr4.6.1/contrib/analysis-extras/lucene-libs/lucene-analyzers-icu-4.6.1.jar'
>>>>>> to classloader.
>>>>>> 
>>>>>> Am I missing something?
>>>>>> 
>>>>>> I solr's subversion I found
>>>>>> 
>>>>>> 
>>>> 
>> /SVN/solr/contrib/analysis-extras/src/java/org/apache/solr/schema/ICUCollationField.java
>>>>>> but no corresponding class in solr4.6.1's contrib folder.
>>>>>> 
>>>>>> Best
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to