Re: Synonyms and stemming revisited

Christian Vogler Sun, 07 Sep 2008 00:44:46 -0700

> If i've given differnet advice in the past, I'm sure i had a good reason
> for -- possible due to some aspect of those problems that are subtly
> differnet then yours ... can you post links to hte specific messages
> you're refering to, it might help jog my memory.


One thread is: http://www.nabble.com/synonyms-td16284520.html

Based on my reading of that thread, I believe that the issue raised there is 
the same as the one I just raised, but the original post was not entirely 
clear and perhaps easy to misunderstand.

Another thread is: 
http://www.nabble.com/stemming-the-synonyms-to16945953.html#a16945953

> A recently added feature is that when configuring SynonymFilterFactory
> you can give it the name of a TokenizerFactory to use when parsing the
> synonym file.  This could be used to stem words *if* you write a
> TokenizerFactory that calls out to your Stemmer.

Ah, cool. I will give the SOLR 1.3 nightlies a spin, once I make it past my 
current deadlines and obligations.

> (see SOLR-319 for the backround on why you can only specify a Tokenizer
> and not a full "fieldType" to get the analysis chain from ... in a
> nutshell: 1. it would have been harder to implement; 2. the only use cases
> people could think of where Tokenization based.)

There probably needs to be a chain of tokenizers, because in the German 
language compound words need to be split before stemming. I will take a stab 
at writing the TokenizerFactory that chains them. Should not be too 
difficult.

Best regards
- Christian

Re: Synonyms and stemming revisited

Reply via email to