Mike, thanks a lot for your example: the idea here would be you would put the lowercasefilter after the synonymfilter, and then you get this exact flexibility?
e.g. WhitespaceTokenizer SynonymFilter -> no lowercasing of tokens are done as it "analyzes" your synonyms with just the tokenizer LowerCaseFilter but WhitespaceTokenizer LowerCaseFilter SynonymFilter -> the synonyms are lowercased, as it "analyzes" synonyms with the tokenizer+filter its already inconsistent today, because if you do: LowerCaseTokenizer SynonymFilter then your synonyms are in fact all being lowercased... its just arbitrary that they are only being analyzed with the "tokenizer". On Tue, Apr 26, 2011 at 4:13 PM, Mike Sokolov <soko...@ifactory.com> wrote: > Suppose your analysis stack includes lower-casing, but your synonyms are > only supposed to apply to upper-case tokens. For example, "PET" might be a > synonym of "positron emission tomography", but "pet" wouldn't be. > > -Mike > > On 04/26/2011 09:51 AM, Robert Muir wrote: >> >> On Tue, Apr 26, 2011 at 12:24 AM, Otis Gospodnetic >> <otis_gospodne...@yahoo.com> wrote: >> >> >>> >>> But somehow this feels bad (well, so does sticking word variations in >>> what's >>> supposed to be a synonyms file), partly because it means that the person >>> adding >>> new synonyms would need to know what they stem to (or always check it >>> against >>> Solr before editing the file). >>> >> >> when creating the synonym map from your input file, currently the >> factory actually uses your Tokenizer only to pre-process the synonyms >> file. >> >> One idea would be to use the tokenstream up to the synonymfilter >> itself (including filters). This way if you put a stemmer before the >> synonymfilter, it would stem your synonyms file, too. >> >> I haven't totally thought the whole thing through to see if theres a >> big reason why this wouldn't work (the synonymsfilter is complicated, >> sorry). But it does seem like it would produce more consistent >> results... and perhaps the inconsistency isnt so obvious since in the >> default configuration the synonymfilter is directly after the >> tokenizer. >> >