: My imaginative use case: : - the user enters a term and maybe he turns on a flag to get not just : the term, but all terms, which related somehow with this (usually the : synonyms and narrower terms). : - Solr first find the queried term(s) in the thesaurus, then finds the : related terms, modifies and issues the query : e.g. query is fruits, and it becames (fruit OR apple OR banana OR ...) : : This use case is different from the synonym handler, which - as far as : I know - modifies the index, and injects synonyms at the position of : the original word. My use case suppose, that we maintain thesaurus as : a different "database" (maybe another Solr index).
the use case you describe *could* be solved using the SynonymFilter -- you can configure it to be used at query time (for query expansion) *or* you can configure it to be used at index time (for reduction or expansion) just express your thesaurus in the synonyms.txt format and configure it in your schema.xml The two gotcha's to watch out for with this kind of appoach is multiword synonyms and the way Lucene's QueryParser treats whitespace as a metacharacter. in general, if you're going to do this kind of major query expantion, you probably wnat to use something like the "FieldQParser" which doesn't treat whitespace as special so user input like... United States ...makes it to hte analyzer as one chunk of text, and can be looked up as is in your thesaurus. The multiword synonym issue is more complicated - i don't have the energy to fully explain it right now, but for query time expansion it can be a real pain in the ass. one word arround is to index shingle-esque terms instead of hte individual words in your synonyms, but that defeats the point of your goal of having an external thesarus that can be modified independently of the index. My suggestion would be to write a simple little ThesarusQParser, that can use and instance of the SynonymFilter directly to preprocess the input text to get a list of all the Related Terms, and then delegate to another QParser to generate an appropate Query for each of them (typically a PhraseQuery) which your ThesarusQParser would then combine into a giant BooleanQuery (except you may wnat to consider a DisjunctionMaxQuery instead because of the scoring factors) ThesarusQParser would require very little code, because SynonymFilter would be doing all the hard work. -Hoss