I happened to revisit this post that I had started long time back. I'm still using the same query time synonyms. Now i want to be able to map cities to states in the synonyms and continuing to have this issue with the multi-word synonyms. Could you please explain what you've done to overcome this issue again please. I didn't quite understand what HIER_FAMILIY_01, SYN_FAMILY_01 are. Thanks.
lorenzo zhak wrote: > > Hi, > > I had to work with this kind of sides effects reguarding multiwords > synonyms. > We installed solr on our project that extensively uses synonyms, a big > list that sometimes could bring out some wrong match as the one > noticed by Anuvenk > for instance > >> dui => drunk driving defense >> or >> dui,drunk driving defense,drunk driving law >> query for "dui" matches "dui => drunk driving defense" and "dui,drunk >> driving defense,drunk driving law" > > in order to prevent this kind of behavior I gave for every "synonyms > family" (saying a single line in the file) a unique identifier, > so the list looks like : > > dui => HIER_FAMILIY_01 > drunk driving defense => HIER_FAMILIY_01 > SYN_FAMILY_01, dui,drunk driving defense,drunk driving law > > I also set the synonyms filter at index time with expand=false, and at > query time with expand=false > > so in this way, the matched synonyms (multi words or single words) in > documents are replaced with their family identifier, and not all the > possibilities. Indexing with expand=true will add words in documents > that could be matched alone, ignoring the fact that they belong to > multiwords expression, and this could end up with a wrong match > (intending syns mix) at query time. > > so in this way a query for "dui", will be changed by the synonym > filter at query time with "HIER_FAMILIY_01" or "SYN_FAMILY_01" so > documents that contains only single words like "drunk", "driving" or > "law" will not be matched since only a document with the phrase "drunk > driving law" would have been indexed with "SYN_FAMILY_01". > > The approach worked pretty good on our project and we do not notice > any sides effects on the searches, it only removes matched documents > that were considered as "noise" of the synonyms mix issue. > > I think this could be usefull to add this kind of approach on the solr > synoyms filter section of the wiki, > > Cheers > > Laurent > > > On Dec 2, 2007 3:41 PM, Otis Gospodnetic <otis_gospodne...@yahoo.com> > wrote: >> Hi (changing to solr-user list) >> >> Yes it is, especially if the terms left of => are multi-spaced. Check >> out the Wiki, one page there explains this nicely. >> >> Otis >> - >> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >> >> ----- Original Message ---- >> From: anuvenk <anuvenkat...@hotmail.com> >> To: solr-...@lucene.apache.org >> Sent: Saturday, December 1, 2007 1:21:49 AM >> Subject: Re: synonyms >> >> >> Ideally, would it be a good idea to pass the index data through the >> synonyms >> filter while indexing? >> Also, >> say i have this mapping >> dui => drunk driving defense >> or >> dui,drunk driving defense,drunk driving law >> >> so matches for dui, will also bring up matches for drunk driving law >> (the >> whole phrase) or does it also bring up all matches for 'drunk' , >> 'driving','law' ? >> >> >> >> Yonik Seeley wrote: >> > >> > On Nov 30, 2007 5:39 PM, anuvenk <anuvenkat...@hotmail.com> wrote: >> >> Should data be re-indexed everytime synonyms like >> >> word1,word2 >> >> or >> >> word1 => word2 >> >> >> >> are added to synonyms.txt >> > >> > Yes, if it changes the index (if it's used in the index anaylzer as >> > opposed to just the query analyzer). >> > >> > -Yonik >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/synonyms-tf4925232.html#a14100346 >> Sent from the Solr - Dev mailing list archive at Nabble.com. >> >> >> >> >> > > -- View this message in context: http://www.nabble.com/Re%3A-synonyms-tp14116132p23860862.html Sent from the Solr - User mailing list archive at Nabble.com.