OK, let's do a simple test instead of making claims - take your solr instance, anything bigger or equal to version 4.0
In your schema.xml, pick a field and add the synonym filter <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.KeywordTokenizerFactory" /> in your synonyms.txt, add these entries: hubble\0space\0telescope, HST ATTENTION: the \0 is a null byte, you must be written as null byte! You can do it with: python -c "print \"hubble\0space\0telescope,HST\"" > synonyms.txt send a phrase query q=field:"hubble space telescope"&debugQuery=true if you have done it right, you will see 'HST' is in the list - this means, solr is able to recognize the multi-token synonym! As far as recognition is concerned, there is no need for more work on FST. I have written a big unittest that proves the point (9 months ago, LUCENE-4499) making no changes in the way how FST works. What is missing is the query parser that can take advantage - another JIRA issue. I'll repeat my claim now: the solution(s) are there, they solve the problem completely - they are not inside one JIRA issue, but they are there. They need to be proven wrong, NOT proclaimed incomplete. roman On Wed, Jul 17, 2013 at 10:22 AM, Jack Krupansky <j...@basetechnology.com>wrote: > To the best of my knowledge, there is no patch or collection of patches > which constitutes a "working solution" - just partial solutions. > > Yes, it is true, there is some FST work underway (active??) that shows > promise depending on query parser implementation, but again, this is all a > longer-term future, not a "here and now". Maybe in the 5.0 timeframe? > > I don't want anyone to get the impression that there are off-the-shelf > patches that completely solve the synonym phrase problem. Yes, progress is > being made, but we're not there yet. > > -- Jack Krupansky > > -----Original Message----- From: Roman Chyla > Sent: Wednesday, July 17, 2013 9:58 AM > To: solr-user@lucene.apache.org > > Subject: Re: Searching w/explicit Multi-Word Synonym Expansion > > Hi all, > > What I find very 'sad' is that Lucene/SOLR contain all the necessary > components for handling multi-token synonyms; the Finite State Automaton > works perfectly for matching these items; the biggest problem is IMO the > old query parser which split things on spaces and doesn't know to be > smarter. > > THIS IS A LONG-TIME PROBLEM - THERE EXIST SEVERAL WORKING SOLUTIONS (but > none was committed...sigh, we are re-inventing wheel all the time...) > > LUCENE-1622 > LUCENE-4381 > LUCENE-4499 > > > The problem of synonym expansion is more difficult becuase of the parsing - > the default parsers are not flexible and they split on empty space - > recently I have proposed a solution which makes also the multi-token > synonym expansion simple > > this is the ticket: > https://issues.apache.org/**jira/browse/LUCENE-5014<https://issues.apache.org/jira/browse/LUCENE-5014> > > that query parser is able to split on spaces, then look back, do the second > pass to see whether to expand with synonyms - and even discover different > parse paths and construct different queries based on that. if you want to > see some complex examples, look at: > https://github.com/romanchyla/**montysolr/blob/master/contrib/** > adsabs/src/test/org/apache/**solr/analysis/** > TestAdsabsTypeFulltextParsing.**java<https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/analysis/TestAdsabsTypeFulltextParsing.java> > - > eg. line 373, 483 > > > Lucene/SOLR developers are already doing great work and have much to do - > they need help from everybody who is able to apply patch, test it and > report back to JIRA. > > roman > > > > On Wed, Jul 17, 2013 at 9:37 AM, dmarini <david.marini...@gmail.com> > wrote: > > iorixxx, >> >> Thanks for pointing me in the direction of the QueryElevation component. >> If >> it did not require that the target documents be keyed by the unique key >> field it would be ideal, but since our Sku field is not the Unique field >> (we >> have an internal id which serves as the key while this is the client's >> key) >> it doesn't seem like it will match unless I make a larger scope change. >> >> Jack, >> >> I agree that out of the box there hasn't been a generalized solution for >> this yet. I guess what I'm looking for is confirmation that I've gone as >> far >> as I can properly and from this point need to consider using something >> like >> the HON custom query parser component (which we're leery of using because >> from my reading it solves a specific scenario that may overcompensate what >> we're attempting to fix). I would personally rather stay IN solr than add >> custom .jar files from around the web if at all possible. >> >> Thanks for the replies. >> >> --Dave >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.**nabble.com/Searching-w-** >> explicit-Multi-Word-Synonym-**Expansion-tp4078469p4078610.**html<http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469p4078610.html> >> Sent from the Solr - User mailing list archive at Nabble.com. >> >> >