Remember, this is the "users" list, not the "dev" list. Users want to know what they can do and use off the shelf today, not what "could" be developed. Hopefully, the situation will be brighter in six months or a year, but today... is today, not tomorrow.

(And, in fact, users can use LucidWorks Search for query-time phrase synonyms, off-the-shelf, today, no patches required.)

-- Jack Krupansky

-----Original Message----- From: Roman Chyla
Sent: Wednesday, July 17, 2013 11:44 AM
Subject: Re: Searching w/explicit Multi-Word Synonym Expansion

OK, let's do a simple test instead of making claims - take your solr
instance, anything bigger or equal to version 4.0

In your schema.xml, pick a field and add the synonym filter

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
                   ignoreCase="true" expand="true"
tokenizerFactory="solr.KeywordTokenizerFactory" />

in your synonyms.txt, add these entries:

hubble\0space\0telescope, HST

ATTENTION: the \0 is a null byte, you must be written as null byte! You can
do it with: python -c "print \"hubble\0space\0telescope,HST\"" >

send a phrase query q=field:"hubble space telescope"&debugQuery=true

if you have done it right, you will see 'HST' is in the list - this means,
solr is able to recognize the multi-token synonym! As far as recognition is
concerned, there is no need for more work on FST.

I have written a big unittest that proves the point (9 months ago,
LUCENE-4499) making no changes in the way how FST works. What is missing is
the query parser that can take advantage - another JIRA issue.

I'll repeat my claim now: the solution(s) are there, they solve the problem
completely - they are not inside one JIRA issue, but they are there. They
need to be proven wrong, NOT proclaimed incomplete.


On Wed, Jul 17, 2013 at 10:22 AM, Jack Krupansky <>wrote:

To the best of my knowledge, there is no patch or collection of patches
which constitutes a "working solution" - just partial solutions.

Yes, it is true, there is some FST work underway (active??) that shows
promise depending on query parser implementation, but again, this is all a
longer-term future, not a "here and now". Maybe in the 5.0 timeframe?

I don't want anyone to get the impression that there are off-the-shelf
patches that completely solve the synonym phrase problem. Yes, progress is
being made, but we're not there yet.

-- Jack Krupansky

-----Original Message----- From: Roman Chyla
Sent: Wednesday, July 17, 2013 9:58 AM

Subject: Re: Searching w/explicit Multi-Word Synonym Expansion

Hi all,

What I find very 'sad' is that Lucene/SOLR contain all the necessary
components for handling multi-token synonyms; the Finite State Automaton
works perfectly for matching these items; the biggest problem is IMO the
old query parser which split things on spaces and doesn't know to be

none was committed...sigh, we are re-inventing wheel all the time...)


The problem of synonym expansion is more difficult becuase of the parsing -
the default parsers are not flexible and they split on empty space -
recently I have proposed a solution which makes also the multi-token
synonym expansion simple

this is the ticket:**jira/browse/LUCENE-5014<>

that query parser is able to split on spaces, then look back, do the second
pass to see whether to expand with synonyms - and even discover different
parse paths and construct different queries based on that. if you want to
see some complex examples, look at:**montysolr/blob/master/contrib/**
eg. line 373, 483

Lucene/SOLR developers are already doing great work and have much to do -
they need help from everybody who is able to apply patch, test it and
report back to JIRA.


On Wed, Jul 17, 2013 at 9:37 AM, dmarini <>


Thanks for pointing me in the direction of the QueryElevation component.
it did not require that the target documents be keyed by the unique key
field it would be ideal, but since our Sku field is not the Unique field
have an internal id which serves as the key while this is the client's
it doesn't seem like it will match unless I make a larger scope change.


I agree that out of the box there hasn't been a generalized solution for
this yet. I guess what I'm looking for is confirmation that I've gone as
as I can properly and from this point need to consider using something
the HON custom query parser component (which we're leery of using because
from my reading it solves a specific scenario that may overcompensate what
we're attempting to fix). I would personally rather stay IN solr than add
custom .jar files from around the web if at all possible.

Thanks for the replies.


View this message in context:
Sent from the Solr - User mailing list archive at

Reply via email to