On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote:
> But somehow this feels bad (well, so does sticking word variations in what's > supposed to be a synonyms file), partly because it means that the person > adding > new synonyms would need to know what they stem to (or always check it against > Solr before editing the file). when creating the synonym map from your input file, currently the factory actually uses your Tokenizer only to pre-process the synonyms file. One idea would be to use the tokenstream up to the synonymfilter itself (including filters). This way if you put a stemmer before the synonymfilter, it would stem your synonyms file, too. I haven't totally thought the whole thing through to see if theres a big reason why this wouldn't work (the synonymsfilter is complicated, sorry). But it does seem like it would produce more consistent results... and perhaps the inconsistency isnt so obvious since in the default configuration the synonymfilter is directly after the tokenizer.