Thank you Steve -- very helpful. I can see that whatever implementation I decide to try, some testing will be in order. If anyone is aware of significant gotchas with this synonym thing that are not mentioned in the already-listed URLs, please feel free to comment.
On Fri, May 27, 2016 at 10:28 AM, Steve Rowe <sar...@gmail.com> wrote: > I’m working on addressing problems using multi-term synonyms at query time > in Lucene and Solr. > > I recommend these two blogs for understanding the issues (the second one > was mentioned earlier in this thread): > > < > http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html > > > <https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/> > > In addition to the already-mentioned projects, there is also: > > <https://issues.apache.org/jira/browse/SOLR-5379> > > All of these projects try in various ways to work around the fact that > Lucene’s QueryParser splits on whitespace before sending text to analysis, > one token at a time, so in a synonym filter, multi-word synonyms can never > match and add alternatives. See < > https://issues.apache.org/jira/browse/LUCENE-2605>, where I’ve posted a > patch to directly address that problem - note that it’s still a work in > progress. > > Once LUCENE-2605 has been fixed, there is still work to do getting > (e)dismax to work with the modified Lucene QueryParser, and addressing > problems with how queries are constructed from Lucene’s “sausagized” token > stream. > > -- > Steve > www.lucidworks.com > > > On May 26, 2016, at 2:21 PM, John Bickerstaff <j...@johnbickerstaff.com> > wrote: > > > > Thanks Chris -- > > > > The two projects I'm aware of are: > > > > https://github.com/healthonnet/hon-lucene-synonyms > > > > and the one referenced from the Lucidworks page here: > > > https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ > > > > ... which is here : > https://github.com/LucidWorks/auto-phrase-tokenfilter > > > > Is there anything else out there that you would recommend I look at? > > > > On Thu, May 26, 2016 at 12:01 PM, Chris Morley <ch...@depahelix.com> > wrote: > > > >> Chris Morley here, from Wayfair. (Depahelix = my domain) > >> > >> Suyash Sonawane and I have worked on multiple word synonyms at Wayfair. > >> We worked mostly off of Ted Sullivan's work and also off of some > >> suggestions from Koorosh Vakhshoori. We have gotten to a point where we > >> have a more sophisticated internal implementation, however, we've found > >> that it is very difficult to make it do what you want it to do, and > also be > >> sufficiently performant. Watch out for exceptional situations with mm > >> (minimum should match). > >> > >> Trey Grainger (now at Lucidworks) and Simon Hughes of Dice.com have also > >> done work in this area. > >> > >> It should be very possible to get this kind of thing working on > >> SolrCloud. I haven't tried it yet but I think theoretically, it should > >> just work. The synonyms stuff is mostly about doing things at index > time > >> and query time. The index time stuff should translate to SolrCloud > >> directly, while the query time stuff might pose some issues, but > probably > >> not too bad, if there are any issues at all. > >> > >> I've had decent luck porting our various plugins from 4.10.x to 5.5.0 > >> because a lot of stuff is just Java, and it still works within the Jetty > >> context. > >> > >> -Chris. > >> > >> > >> > >> > >> ---------------------------------------- > >> From: "John Bickerstaff" <j...@johnbickerstaff.com> > >> Sent: Thursday, May 26, 2016 1:51 PM > >> To: solr-user@lucene.apache.org > >> Subject: Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax > parser > >> Hey Jeff (or anyone interested in multi-word synonyms) here are some > >> potentially interesting links... > >> > >> http://wiki.apache.org/solr/QueryParser (search the page for > >> synonum_edismax) > >> > >> https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ > (blog > >> post about what became the synonym_edissmax Query Parser) > >> > >> > >> > https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/ > >> > >> This last was useful for lots of reasons and contains links to other > >> interesting, related web pages... > >> > >> On Thu, May 26, 2016 at 11:45 AM, Jeff Wartes <jwar...@whitepages.com> > >> wrote: > >> > >>> Oh, interesting. I've certainty encountered issues with multi-word > >>> synonyms, but I hadn't come across this. If you end up using it with a > >>> recent solr verison, I'd be glad to hear your experience. > >>> > >>> I haven't used it, but I am aware of one other project in this vein > that > >>> you might be interested in looking at: > >>> https://github.com/LucidWorks/auto-phrase-tokenfilter > >>> > >>> > >>> On 5/26/16, 9:29 AM, "John Bickerstaff" <j...@johnbickerstaff.com> > >> wrote: > >>> > >>>> Ahh - for question #3 I may have spoken too soon. This line from the > >>>> github repository readme suggests a way. > >>>> > >>>> Update: We have tested to run with the jar in $SOLR_HOME/lib as well, > >> and > >>>> it works (Jetty). > >>>> > >>>> I'll try that and only respond back if that doesn't work. > >>>> > >>>> Questions 1 and 2 still stand of course... If anyone on the list has > >>>> experience in this area... > >>>> > >>>> Thanks. > >>>> > >>>> On Thu, May 26, 2016 at 10:25 AM, John Bickerstaff < > >>> j...@johnbickerstaff.com > >>>>> wrote: > >>>> > >>>>> Hi all, > >>>>> > >>>>> I'm creating a Solr Cloud that will index and search medical text. > >>>>> Multi-word synonyms are a pretty important factor. > >>>>> > >>>>> I find that there are some challenges around multi-word synonyms and > I > >>>>> also found on the wiki that there is a recommended 3rd-party parser > >>>>> (synonym_edismax parser) created by Nolan Lawson and found here: > >>>>> https://github.com/healthonnet/hon-lucene-synonyms > >>>>> > >>>>> Here's the thing - the instructions on the github site involve > >> bringing > >>>>> the jar file into the war file - which is not applicable any more... > >> at > >>>>> least I think it's not... > >>>>> > >>>>> I have three questions: > >>>>> > >>>>> 1. Is this still a good solution for multi-word synonyms (I.e. Solr > >>> Cloud > >>>>> doesn't break it in some way) > >>>>> 2. Is there a tool or plug-in out there that the contributors would > >>>>> recommend above this one? > >>>>> 3. Assuming 1 = yes and 2 = no, can anyone tell me an updated > >> procedure > >>>>> for bringing it in to Solr Cloud (I'm running 5.4.x) > >>>>> > >>>>> Thanks > >>>>> > >>> > >>> > >> > >> > >> > >