hossman wrote:
> 
> This is "Issue #1" regarding trying to use query time multi word synonyms 
> discussed on the wiki...
> 
>>> "The Lucene QueryParser tokenizes on white space before giving any 
>>> text to the Analyzer, so if a person searches for the words sea biscit 
>>> the analyzer will be given the words "sea" and "biscit" seperately, and 
>>> will not know that they match a synonym.
> 
> on the "boosting" part of the query (where the dismax handler 
> automagically quote the entire input and queries it against the "pf" 
> fields, the synonyms do get used (because the whole input is analyzed as 
> one string) but in this case the phrase queries will match any of these 
> phrases...
> 
>    divorce dispute resolution
>    alternative mediation resolution
>    divorce mediation resolution
>    etc...
> 
> ..it will *NOT* match either of these phrases...
> 
>    divorce mediation
>    alternative dispute resolution
> 
> ...because the SynonymFilter has no way to tell the query parser which 
> words should be linked to which other words when building up the phrase 
> query.  
> 
> This is "Issue #2" regarding trying to use query time multi word synonyms
> discussed on the wiki...
> 
>>> Phrase searching (ie: "sea biscit") will cause the QueryParser to pass 
>>> the entire string to the analyzer, but if the SynonymFilter is 
>>> configured to expand the synonyms, then when the QueryParser gets the  
>>> resulting list of tokens back from the Analyzer, it will construct a  
>>> MultiPhraseQuery that will not have the desired effect. This is because  
>>> of the limited mechanism available for the Analyzer to indicate that 
>>> two terms occupy the same position: there is no way to indicate that a  
>>> "phrase" occupies the same position as a term. For our example the  
>>> resulting MultiPhraseQuery would be "(sea | sea | seabiscuit) (biscuit 
>>> | biscit)" which would not match the simple case of "seabisuit" 
>>> occuring in a document
> 
> : I have the synonym filter only at query time coz i can't re-index data
> (or
> : portion of data) everytime i add a synonym and a couple of other
> reasons.
> 
> Use cases like yours will *never* work as a query time synonym ... hence 
> all of the information about multi-word synonyms and the caveats about 
> using them in the wiki...
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter
> 
> 
> -Hoss
> 
> 
> 

We have a very similar problem, and want to make sure that this is hopeless
with Solr before we try something else...

I have a synonyms.txt file similar to the following:
bar=>bar, club
club=>club, bar, night club
...

A search for 'bar' returns the exact results we want: anything with 'bar' or
'club' in the name.  However, a search for 'club' produces very strange
results: name:"(club bar night) club"

Knowing the Lucene struggles with multi-word query-time synonyms, my
question is, does this also affect index-time synonyms? What other
alternatives do we have if we require there to be multiple word synonyms?

-- 
View this message in context: 
http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18349953.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to