hossman wrote: > > This is "Issue #1" regarding trying to use query time multi word synonyms > discussed on the wiki... > >>> "The Lucene QueryParser tokenizes on white space before giving any >>> text to the Analyzer, so if a person searches for the words sea biscit >>> the analyzer will be given the words "sea" and "biscit" seperately, and >>> will not know that they match a synonym. > > on the "boosting" part of the query (where the dismax handler > automagically quote the entire input and queries it against the "pf" > fields, the synonyms do get used (because the whole input is analyzed as > one string) but in this case the phrase queries will match any of these > phrases... > > divorce dispute resolution > alternative mediation resolution > divorce mediation resolution > etc... > > ..it will *NOT* match either of these phrases... > > divorce mediation > alternative dispute resolution > > ...because the SynonymFilter has no way to tell the query parser which > words should be linked to which other words when building up the phrase > query. > > This is "Issue #2" regarding trying to use query time multi word synonyms > discussed on the wiki... > >>> Phrase searching (ie: "sea biscit") will cause the QueryParser to pass >>> the entire string to the analyzer, but if the SynonymFilter is >>> configured to expand the synonyms, then when the QueryParser gets the >>> resulting list of tokens back from the Analyzer, it will construct a >>> MultiPhraseQuery that will not have the desired effect. This is because >>> of the limited mechanism available for the Analyzer to indicate that >>> two terms occupy the same position: there is no way to indicate that a >>> "phrase" occupies the same position as a term. For our example the >>> resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit >>> | biscit)" which would not match the simple case of "seabisuit" >>> occuring in a document > > : I have the synonym filter only at query time coz i can't re-index data > (or > : portion of data) everytime i add a synonym and a couple of other > reasons. > > Use cases like yours will *never* work as a query time synonym ... hence > all of the information about multi-word synonyms and the caveats about > using them in the wiki... > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter > > > -Hoss > > >
We have a very similar problem, and want to make sure that this is hopeless with Solr before we try something else... I have a synonyms.txt file similar to the following: bar=>bar, club club=>club, bar, night club ... A search for 'bar' returns the exact results we want: anything with 'bar' or 'club' in the name. However, a search for 'club' produces very strange results: name:"(club bar night) club" Knowing the Lucene struggles with multi-word query-time synonyms, my question is, does this also affect index-time synonyms? What other alternatives do we have if we require there to be multiple word synonyms? -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18349953.html Sent from the Solr - User mailing list archive at Nabble.com.