Re: Searching w/explicit Multi-Word Synonym Expansion

Roman Chyla Wed, 17 Jul 2013 08:46:36 -0700

OK, let's do a simple test instead of making claims - take your solr
instance, anything bigger or equal to version 4.0


In your schema.xml, pick a field and add the synonym filter

<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
                    ignoreCase="true" expand="true"
tokenizerFactory="solr.KeywordTokenizerFactory" />

in your synonyms.txt, add these entries:

hubble\0space\0telescope, HST

ATTENTION: the \0 is a null byte, you must be written as null byte! You can
do it with: python -c "print \"hubble\0space\0telescope,HST\"" >
synonyms.txt

send a phrase query q=field:"hubble space telescope"&debugQuery=true

if you have done it right, you will see 'HST' is in the list - this means,
solr is able to recognize the multi-token synonym! As far as recognition is
concerned, there is no need for more work on FST.

I have written a big unittest that proves the point (9 months ago,
LUCENE-4499) making no changes in the way how FST works. What is missing is
the query parser that can take advantage - another JIRA issue.

I'll repeat my claim now: the solution(s) are there, they solve the problem
completely - they are not inside one JIRA issue, but they are there. They
need to be proven wrong, NOT proclaimed incomplete.


roman


On Wed, Jul 17, 2013 at 10:22 AM, Jack Krupansky <j...@basetechnology.com>wrote:

> To the best of my knowledge, there is no patch or collection of patches
> which constitutes a "working solution" - just partial solutions.
>
> Yes, it is true, there is some FST work underway (active??) that shows
> promise depending on query parser implementation, but again, this is all a
> longer-term future, not a "here and now". Maybe in the 5.0 timeframe?
>
> I don't want anyone to get the impression that there are off-the-shelf
> patches that completely solve the synonym phrase problem. Yes, progress is
> being made, but we're not there yet.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Roman Chyla
> Sent: Wednesday, July 17, 2013 9:58 AM
> To: solr-user@lucene.apache.org
>
> Subject: Re: Searching w/explicit Multi-Word Synonym Expansion
>
> Hi all,
>
> What I find very 'sad' is that Lucene/SOLR contain all the necessary
> components for handling multi-token synonyms; the Finite State Automaton
> works perfectly for matching these items; the biggest problem is IMO the
> old query parser which split things on spaces and doesn't know to be
> smarter.
>
> THIS IS A LONG-TIME PROBLEM - THERE EXIST SEVERAL WORKING SOLUTIONS (but
> none was committed...sigh, we are re-inventing wheel all the time...)
>
> LUCENE-1622
> LUCENE-4381
> LUCENE-4499
>
>
> The problem of synonym expansion is more difficult becuase of the parsing -
> the default parsers are not flexible and they split on empty space -
> recently I have proposed a solution which makes also the multi-token
> synonym expansion simple
>
> this is the ticket:
> https://issues.apache.org/**jira/browse/LUCENE-5014<https://issues.apache.org/jira/browse/LUCENE-5014>
>
> that query parser is able to split on spaces, then look back, do the second
> pass to see whether to expand with synonyms - and even discover different
> parse paths and construct different queries based on that. if you want to
> see some complex examples, look at:
> https://github.com/romanchyla/**montysolr/blob/master/contrib/**
> adsabs/src/test/org/apache/**solr/analysis/**
> TestAdsabsTypeFulltextParsing.**java<https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/src/test/org/apache/solr/analysis/TestAdsabsTypeFulltextParsing.java>
> -
> eg. line 373, 483
>
>
> Lucene/SOLR developers are already doing great work and have much to do -
> they need help from everybody who is able to apply patch, test it and
> report back to JIRA.
>
> roman
>
>
>
> On Wed, Jul 17, 2013 at 9:37 AM, dmarini <david.marini...@gmail.com>
> wrote:
>
>  iorixxx,
>>
>> Thanks for pointing me in the direction of the QueryElevation component.
>> If
>> it did not require that the target documents be keyed by the unique key
>> field it would be ideal, but since our Sku field is not the Unique field
>> (we
>> have an internal id which serves as the key while this is the client's
>> key)
>> it doesn't seem like it will match unless I make a larger scope change.
>>
>> Jack,
>>
>> I agree that out of the box there hasn't been a generalized solution for
>> this yet. I guess what I'm looking for is confirmation that I've gone as
>> far
>> as I can properly and from this point need to consider using something
>> like
>> the HON custom query parser component (which we're leery of using because
>> from my reading it solves a specific scenario that may overcompensate what
>> we're attempting to fix). I would personally rather stay IN solr than add
>> custom .jar files from around the web if at all possible.
>>
>> Thanks for the replies.
>>
>> --Dave
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.**nabble.com/Searching-w-**
>> explicit-Multi-Word-Synonym-**Expansion-tp4078469p4078610.**html<http://lucene.472066.n3.nabble.com/Searching-w-explicit-Multi-Word-Synonym-Expansion-tp4078469p4078610.html>
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>

Re: Searching w/explicit Multi-Word Synonym Expansion

Reply via email to