Thank you Steve -- very helpful.

I can see that whatever implementation I decide to try, some testing will
be in order.  If anyone is aware of significant gotchas with this synonym
thing that are not mentioned in the already-listed URLs, please feel free
to comment.

On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote:

> I’m working on addressing problems using multi-term synonyms at query time
> in Lucene and Solr.
>
> I recommend these two blogs for understanding the issues (the second one
> was mentioned earlier in this thread):
>
> <
> http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
> >
> <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/>
>
> In addition to the already-mentioned projects, there is also:
>
> <https://issues.apache.org/jira/browse/SOLR-5379>
>
> All of these projects try in various ways to work around the fact that
> Lucene’s QueryParser splits on whitespace before sending text to analysis,
> one token at a time, so in a synonym filter, multi-word synonyms can never
> match and add alternatives.  See <
> https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a
> patch to directly address that problem - note that it’s still a work in
> progress.
>
> Once LUCENE-2605 has been fixed, there is still work to do getting
> (e)dismax to work with the modified Lucene QueryParser, and addressing
> problems with how queries are constructed from Lucene’s “sausagized” token
> stream.
>
> --
> Steve
> www.lucidworks.com
>
> > On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com>
> wrote:
> >
> > Thanks Chris --
> >
> > The two projects I'm aware of are:
> >
> > https://github.com/healthonnet/hon-lucene-synonyms
> >
> > and the one referenced from the Lucidworks page here:
> >
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > ... which is here :
> https://github.com/LucidWorks/auto-phrase-tokenfilter
> >
> > Is there anything else out there that you would recommend I look at?
> >
> > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com>
> wrote:
> >
> >> Chris Morley here, from Wayfair.  (Depahelix = my domain)
> >>
> >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair.
> >> We worked mostly off of Ted Sullivan's work and also off of some
> >> suggestions from Koorosh Vakhshoori.  We have gotten to a point where we
> >> have a more sophisticated internal implementation, however, we've found
> >> that it is very difficult to make it do what you want it to do, and
> also be
> >> sufficiently performant.  Watch out for exceptional situations with mm
> >> (minimum should match).
> >>
> >> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also
> >> done work in this area.
> >>
> >> It should be very possible to get this kind of thing working on
> >> SolrCloud.  I haven't tried it yet but I think theoretically, it should
> >> just work.  The synonyms stuff is mostly about doing things at index
> time
> >> and query time.  The index time stuff should translate to SolrCloud
> >> directly, while the query time stuff might pose some issues, but
> probably
> >> not too bad, if there are any issues at all.
> >>
> >> I've had decent luck porting our various plugins from 4.10.x to 5.5.0
> >> because a lot of stuff is just Java, and it still works within the Jetty
> >> context.
> >>
> >> -Chris.
> >>
> >>
> >>
> >>
> >> ----------------------------------------
> >> From: "John Bickerstaff" <j...@johnbickerstaff.com>
> >> Sent: Thursday, May 26, 2016 1:51 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax
> parser
> >> Hey Jeff (or anyone interested in multi-word synonyms) here are some
> >> potentially interesting links...
> >>
> >> http://wiki.apache.org/solr/QueryParser (search the page for
> >> synonum_edismax)
> >>
> >> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> (blog
> >> post about what became the synonym_edissmax Query Parser)
> >>
> >>
> >>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >>
> >> This last was useful for lots of reasons and contains links to other
> >> interesting, related web pages...
> >>
> >> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com>
> >> wrote:
> >>
> >>> Oh, interesting. I've certainty encountered issues with multi-word
> >>> synonyms, but I hadn't come across this. If you end up using it with a
> >>> recent solr verison, I'd be glad to hear your experience.
> >>>
> >>> I haven't used it, but I am aware of one other project in this vein
> that
> >>> you might be interested in looking at:
> >>> https://github.com/LucidWorks/auto-phrase-tokenfilter
> >>>
> >>>
> >>> On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com>
> >> wrote:
> >>>
> >>>> Ahh - for question #3 I may have spoken too soon. This line from the
> >>>> github repository readme suggests a way.
> >>>>
> >>>> Update: We have tested to run with the jar in $SOLR_HOME/lib as well,
> >> and
> >>>> it works (Jetty).
> >>>>
> >>>> I'll try that and only respond back if that doesn't work.
> >>>>
> >>>> Questions 1 and 2 still stand of course... If anyone on the list has
> >>>> experience in this area...
> >>>>
> >>>> Thanks.
> >>>>
> >>>> On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff <
> >>> j...@johnbickerstaff.com
> >>>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> I'm creating a Solr Cloud that will index and search medical text.
> >>>>> Multi-word synonyms are a pretty important factor.
> >>>>>
> >>>>> I find that there are some challenges around multi-word synonyms and
> I
> >>>>> also found on the wiki that there is a recommended 3rd-party parser
> >>>>> (synonym_edismax parser) created by Nolan Lawson and found here:
> >>>>> https://github.com/healthonnet/hon-lucene-synonyms
> >>>>>
> >>>>> Here's the thing - the instructions on the github site involve
> >> bringing
> >>>>> the jar file into the war file - which is not applicable any more...
> >> at
> >>>>> least I think it's not...
> >>>>>
> >>>>> I have three questions:
> >>>>>
> >>>>> 1. Is this still a good solution for multi-word synonyms (I.e. Solr
> >>> Cloud
> >>>>> doesn't break it in some way)
> >>>>> 2. Is there a tool or plug-in out there that the contributors would
> >>>>> recommend above this one?
> >>>>> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated
> >> procedure
> >>>>> for bringing it in to Solr Cloud (I'm running 5.4.x)
> >>>>>
> >>>>> Thanks
> >>>>>
> >>>
> >>>
> >>
> >>
> >>
>
>

Reply via email to