Hello Bernd, Thanks a lot for your answer. I'll work on this.
Best regards, Elisabeth 2012/5/29 Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > Hello Elisabeth, > > my synonyms.txt is like your 2nd example: > > naturwald, φυσικό\ δάσος, естествена\ гора, prírodný\ les, naravni\ gozd, > foresta\ naturale, natuurbos, natural\ forest, bosque\ natural, > természetes\ erdő, > natūralus\ miškas, prirodna\ šuma, dabiskais\ mežs, floresta\ natural, > naturskov, > forêt\ naturelle, naturskog, přírodní\ les, luonnonmetsä, pădure\ naturală, > las\ naturalny, natürlicher\ wald > > > An example from my system with debugging turned on and searching for > "naturwald": > > <lst name="debug"> > <str name="rawquerystring">naturwald</str> > <str name="querystring">naturwald</str> > <str name="parsedquery">textth:naturwald textth:"φυσικό δάσος" > textth:"естествена гора" > textth:"prírodný les" textth:"naravni gozd" textth:"foresta naturale" > textth:natuurbos > textth:"natural forest" textth:"bosque natural" textth:"természetes erdő" > textth:"natūralus miškas" textth:"prirodna šuma" textth:"dabiskais mežs" > textth:"floresta natural" textth:naturskov textth:"forêt naturelle" > textth:naturskog > textth:"přírodní les" textth:luonnonmetsä textth:"pădure naturală" > textth:"las naturalny" > textth:"natürlicher wald"</str> > ... > > As you can see my search for "naturwald" extends to single and multiword > synonyms e.g. "forêt naturelle" > > > My SynonymFilterFactory has the following settings: > > org.apache.solr.analysis.SynonymFilterFactory > {tokenizerFactory=solr.KeywordTokenizerFactory, > synonyms=synonyms_eurovoc_desc_desc_ufall.txt, expand=true, format=solr, > ignoreCase=true, > luceneMatchVersion=LUCENE_36} > > But as I already mentioned, there is much more work to be done to get it > running than > just using SynonymFilterFactory. > > Regards > Bernd > > > > Am 23.05.2012 08:49, schrieb elisabeth benoit: > > Hello Bernd, > > > > Thanks for your advice. > > > > I have one question: how did you manage to map one word to a multiwords > > synonym??? > > > > I've tried (in synonyms.txt) > > > > mairie, hotel de ville > > > > mairie, hotel\ de\ ville > > > > mairie => mairie, hotel de ville > > > > mairie => mairie, hotel\ de\ ville > > > > but nothing prevents mairie from matching with "hotel"... > > > > The only way I found is to use > > tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms > declaration > > in schema.xml, but then since "mairie" is not alone in my index field, it > > doesn't match. > > > > > > best regards, > > Elisabeth > > > > > > > > > > the only way I found, I schema.xml, is to use > > > > > > > > 2012/5/15 Bernd Fehling <bernd.fehl...@uni-bielefeld.de> > > > >> Without reading the whole thread let me say that you should not trust > >> the solr admin analysis. It takes the whole multiword search and runs > >> it all together at once through each analyzer step (factory). > >> But this is not how the real system works. First pitfall, the query > parser > >> is also splitting at white space (if not a phrase query). Due to this, > >> a multiword query is send chunk after chunk through the analyzer and, > >> second pitfall, each chunk runs through the whole analyzer by its own. > >> > >> So if you are dealing with multiword synonyms you have the following > >> problems. Either you turn your query into a phrase so that the whole > >> phrase is analyzed at once and therefore looked up as multiword synonym > >> but phrase queries are not analyzed !!! OR you send your query chunk > >> by chunk through the analyzer but then they are not multiwords anymore > >> and are not found in your synonyms.txt. > >> > >> From my experience I can say that it requires some deep work to get it > done > >> but it is possible. I have connected a thesaurus to solr which is doing > >> query time expansion (no need to reindex if the thesaurus changes). > >> The thesaurus holds synonyms and "used for terms" in 24 languages. So > >> it is also some kind of language translation. And naturally the > thesaurus > >> translates from single term to multi term synonyms and vice versa. > >> > >> Regards, > >> Bernd > >> > >> > >> Am 14.05.2012 13:54, schrieb elisabeth benoit: > >>> Just for the record, I'd like to conclude this thread > >>> > >>> First, you were right, there was no behaviour difference between fq > and q > >>> parameters. > >>> > >>> I realized that: > >>> > >>> 1) my synonym (hotel de ville) has a stopword in it (de) and since I > used > >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonyms > >> declaration, > >>> there was no stopword removal in the indewed expression, so when > >> requesting > >>> "hotel de ville", after stopwords removal in query, Solr was comparing > >>> "hotel de ville" > >>> with "hotel ville" > >>> > >>> but my queries never even got to that point since > >>> > >>> 2) I made a mistake using "mairie" alone in the admin interface when > >>> testing my schema. The real field was something like "collectivités > >>> territoriales mairie", > >>> so the synonym "hotel de ville" was not even applied, because of the > >>> tokenizerFactory="solr.KeywordTokenizerFactory" in my synonym > definition > >>> not splitting field into words when parsing > >>> > >>> So my problem is not solved, and I'm considering solving it outside of > >> Solr > >>> scope, unless someone else has a clue > >>> > >>> Thanks again, > >>> Elisabeth > >>> > >>> > >>> > >>> 2012/4/25 Erick Erickson <erickerick...@gmail.com> > >>> > >>>> A little farther down the debug info output you'll find something > >>>> like this (I specified fq=name:features) > >>>> > >>>> <arr name="parsed_filter_queries"> > >>>> <str>name:features</str> > >>>> </arr> > >>>> > >>>> > >>>> so it may well give you some clue. But unless I'm reading things > wrong, > >>>> your > >>>> q is going against a field that has much more information than the > >>>> CATEGORY_ANALYZED field, is it possible that the data from your > >>>> test cases simply isn't _in_ CATEGORY_ANALYZED? > >>>> > >>>> Best > >>>> Erick > >>>> > >>>> On Wed, Apr 25, 2012 at 9:39 AM, elisabeth benoit > >>>> <elisaelisael...@gmail.com> wrote: > >>>>> I'm not at the office until next Wednesday, and I don't have my Solr > >>>> under > >>>>> hand, but isn't debugQuery=on giving informations only about q > >> parameter > >>>>> matching and nothing about fq parameter? Or do you mean > >>>>> "parsed_filter_querie"s gives information about fq? > >>>>> > >>>>> CATEGORY_ANALYZED is being populated by a copyField instruction in > >>>>> schema.xml, and has the same field type as my catchall field, the > >> search > >>>>> field for my searchHandler (the one being used by q parameter). > >>>>> > >>>>> CATEGORY (a string) is copied in CATEGORY_ANALYZED (field type is > text) > >>>>> > >>>>> CATEGORY (a string) is copied in catchall field (field type is text), > >>>> and a > >>>>> lot of other fields are copied too in that catchall field. > >>>>> > >>>>> So as far as I can see, the same analysis should be done in both > cases, > >>>> but > >>>>> obviously I'm missing something, and the only thing I can think of > is a > >>>>> different behavior between q and fq parameter. > >>>>> > >>>>> I'll check that parsed_filter_querie first thing in the morning next > >>>>> Wednesday. > >>>>> > >>>>> Thanks a lot for your help. > >>>>> > >>>>> Elisabeth > >>>>> > >>>>> > >>>>> 2012/4/24 Erick Erickson <erickerick...@gmail.com> > >>>>> > >>>>>> Elisabeth: > >>>>>> > >>>>>> What shows up in the debug section of the response when you add > >>>>>> &debugQuery=on? There should be some bit of that section like: > >>>>>> "parsed_filter_queries" > >>>>>> > >>>>>> My other question is "are you absolutely sure that your > >>>>>> CATEGORY_ANALYZED field has the correct content?". How does it > >>>>>> get populated? > >>>>>> > >>>>>> Nothing jumps out at me here.... > >>>>>> > >>>>>> Best > >>>>>> Erick > >>>>>> > >>>>>> On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit > >>>>>> <elisaelisael...@gmail.com> wrote: > >>>>>>> yes, thanks, but this is NOT my question. > >>>>>>> > >>>>>>> I was wondering why I have multiple matches with q="hotel de ville" > >>>> and > >>>>>> no > >>>>>>> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both > case > >>>> I'm > >>>>>>> searching in the same solr fieldType. > >>>>>>> > >>>>>>> Why is q parameter behaving differently in that case? Why do the > >>>> quotes > >>>>>>> work in one case and not in the other? > >>>>>>> > >>>>>>> Does anyone know? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Elisabeth > >>>>>>> > >>>>>>> 2012/4/24 Jeevanandam <je...@myjeeva.com> > >>>>>>> > >>>>>>>> > >>>>>>>> usage of q and fq > >>>>>>>> > >>>>>>>> q => is typically the main query for the search request > >>>>>>>> > >>>>>>>> fq => is Filter Query; generally used to restrict the super set of > >>>>>>>> documents without influencing score (more info. > >>>>>>>> http://wiki.apache.org/solr/**CommonQueryParameters#q< > >>>>>> http://wiki.apache.org/solr/CommonQueryParameters#q> > >>>>>>>> ) > >>>>>>>> > >>>>>>>> For example: > >>>>>>>> ------------ > >>>>>>>> q="hotel de ville" ===> returns 100 documents > >>>>>>>> > >>>>>>>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" > >>>> ===> > >>>>>>>> returns 40 documents from super set of 100 documents > >>>>>>>> > >>>>>>>> > >>>>>>>> hope this helps! > >>>>>>>> > >>>>>>>> - Jeevanandam > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 24-04-2012 3:08 pm, elisabeth benoit wrote: > >>>>>>>> > >>>>>>>>> Hello, > >>>>>>>>> > >>>>>>>>> I'd like to resume this post. > >>>>>>>>> > >>>>>>>>> The only way I found to do not split synonyms in words in > >>>> synonyms.txt > >>>>>> it > >>>>>>>>> to use the line > >>>>>>>>> > >>>>>>>>> <filter class="solr.**SynonymFilterFactory" > >> synonyms="synonyms.txt" > >>>>>>>>> ignoreCase="true" expand="true" > >>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> > >>>>>>>>> > >>>>>>>>> in schema.xml > >>>>>>>>> > >>>>>>>>> where tokenizerFactory="solr.**KeywordTokenizerFactory" > >>>>>>>>> > >>>>>>>>> instructs SynonymFilterFactory not to break synonyms into words > on > >>>>>> white > >>>>>>>>> spaces when parsing synonyms file. > >>>>>>>>> > >>>>>>>>> So now it works fine, "mairie" is mapped into "hotel de ville" > and > >>>>>> when I > >>>>>>>>> send request q="hotel de ville" (quotes are mandatory to prevent > >>>>>> analyzer > >>>>>>>>> to split hotel de ville on white spaces), I get answers with word > >>>>>>>>> "mairie". > >>>>>>>>> > >>>>>>>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de > >>>> ville"), it > >>>>>>>>> doesn't work!!! > >>>>>>>>> > >>>>>>>>> CATEGORY_ANALYZED is same field type as default search field. > This > >>>>>> means > >>>>>>>>> that when I send q="hotel de ville" and > fq=CATEGORY_ANALYZED:"hotel > >>>> de > >>>>>>>>> ville", solr uses the same analyzer, the one with the line > >>>>>>>>> > >>>>>>>>> <filter class="solr.**SynonymFilterFactory" > synonyms="synonyms.txt" > >>>>>>>>> ignoreCase="true" expand="true" > >>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>. > >>>>>>>>> > >>>>>>>>> Anyone as a clue what is different between q analysis behaviour > and > >>>> fq > >>>>>>>>> analysis behaviour? > >>>>>>>>> > >>>>>>>>> Thanks a lot > >>>>>>>>> Elisabeth > >>>>>>>>> > >>>>>>>>> 2012/4/12 elisabeth benoit <elisaelisael...@gmail.com> > >>>>>>>>> > >>>>>>>>> oh, that's right. > >>>>>>>>>> > >>>>>>>>>> thanks a lot, > >>>>>>>>>> Elisabeth > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2012/4/11 Jeevanandam Madanagopal <je...@myjeeva.com> > >>>>>>>>>> > >>>>>>>>>> Elisabeth - > >>>>>>>>>>> > >>>>>>>>>>> As you described, below mapping might suit for your need. > >>>>>>>>>>> mairie => hotel de ville, mairie > >>>>>>>>>>> > >>>>>>>>>>> mairie gets expanded to "hotel de ville" and "mairie" at index > >>>> time. > >>>>>> So > >>>>>>>>>>> "mairie" and "hotel de ville" searchable on document. > >>>>>>>>>>> > >>>>>>>>>>> However, still white space tokenizer splits at query time will > be > >>>> a > >>>>>>>>>>> problem as described by Markus. > >>>>>>>>>>> > >>>>>>>>>>> --Jeevanandam > >>>>>>>>>>> > >>>>>>>>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote: > >>>>>>>>>>> > >>>>>>>>>>>> <<Have you tried the "=>' mapping instead? Something > >>>>>>>>>>>> <<like > >>>>>>>>>>>> <<hotel de ville => mairie > >>>>>>>>>>>> <<might work for you. > >>>>>>>>>>>> > >>>>>>>>>>>> Yes, thanks, I've tried it but from what I undestand it > doesn't > >>>>>> solve > >>>>>>>>>>> my > >>>>>>>>>>>> problem, since this means hotel de ville will be replace by > >>>> mairie > >>>>>> at > >>>>>>>>>>>> index time (I use synonyms only at index time). So when user > >>>> will > >>>>>> ask > >>>>>>>>>>>> "hôtel de ville", it won't match. > >>>>>>>>>>>> > >>>>>>>>>>>> In fact, at index time I have mairie in my data, but I want > user > >>>>>> to be > >>>>>>>>>>> able > >>>>>>>>>>>> to request "mairie" or "hôtel de ville" and have mairie as > >>>> answer, > >>>>>> and > >>>>>>>>>>> not > >>>>>>>>>>>> have mairie as an answer when requesting "hôtel". > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> <<To map `mairie` to `hotel de ville` as single token you must > >>>>>> escape > >>>>>>>>>>> your > >>>>>>>>>>>> white > >>>>>>>>>>>> <<space. > >>>>>>>>>>>> > >>>>>>>>>>>> <<mairie, hotel\ de\ ville > >>>>>>>>>>>> > >>>>>>>>>>>> <<This results in a problem if your tokenizer splits on white > >>>>>> space > >>>>>>>>>>> at > >>>>>>>>>>>> query > >>>>>>>>>>>> <<time. > >>>>>>>>>>>> > >>>>>>>>>>>> Ok, I guess this means I have a problem. No simple solution > >>>> since > >>>>>> at > >>>>>>>>>>> query > >>>>>>>>>>>> time my tokenizer do split on white spaces. > >>>>>>>>>>>> > >>>>>>>>>>>> I guess my problem is more or less one of the problems > >>>> discussed in > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> http://lucene.472066.n3.**nabble.com/Multi-word-** > >>>>>>>>>>> synonyms-td3716292.html#**a3717215< > >>>>>> > >>>> > >> > http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215 > >>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks a lot for your answers, > >>>>>>>>>>>> Elisabeth > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 2012/4/10 Erick Erickson <erickerick...@gmail.com> > >>>>>>>>>>>> > >>>>>>>>>>>>> Have you tried the "=>' mapping instead? Something > >>>>>>>>>>>>> like > >>>>>>>>>>>>> hotel de ville => mairie > >>>>>>>>>>>>> might work for you. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best > >>>>>>>>>>>>> Erick > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit > >>>>>>>>>>>>> <elisaelisael...@gmail.com> wrote: > >>>>>>>>>>>>>> Hello, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I've read several post on this issue, but can't find a real > >>>>>> solution > >>>>>>>>>>> to > >>>>>>>>>>>>> my > >>>>>>>>>>>>>> multi-words synonyms matching problem. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I have in my synonyms.txt an entry like > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> mairie, hotel de ville > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> and my index time analyzer is configured as followed for > >>>>>> synonyms. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" > >>>>>> synonyms="synonyms.txt" > >>>>>>>>>>>>>> ignoreCase="true" expand="true"/> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The problem I have is that now "mairie" matches with "hotel" > >>>> and > >>>>>> I > >>>>>>>>>>> would > >>>>>>>>>>>>>> only want "mairie" to match with "hotel de ville" and > >>>> "mairie". > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> When I look into the analyzer, I see that "mairie" is mapped > >>>> into > >>>>>>>>>>>>> "hotel", > >>>>>>>>>>>>>> and words "de ville" are added in second and third position. > >>>> To > >>>>>>>>>>> change > >>>>>>>>>>>>>> that, I tried to do > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> <filter class="solr.**SynonymFilterFactory" > >>>>>> synonyms="synonyms.txt" > >>>>>>>>>>>>>> ignoreCase="true" expand="true" > >>>>>>>>>>>>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> (as I > >>>> read in > >>>>>>>>>>> one > >>>>>>>>>>> post) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> and I can see now in the analyzer that "mairie" is mapped to > >>>>>> "hotel > >>>>>>>>>>> de > >>>>>>>>>>>>>> ville", but now when I have query "hotel de ville", it > doesn't > >>>>>> match > >>>>>>>>>>> at > >>>>>>>>>>>>> all > >>>>>>>>>>>>>> with "mairie". > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Anyone has a clue of what I'm doing wrong? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm using Solr 3.4. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Elisabeth > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>> > >>>> > >>> > >> > >> -- > >> ************************************************************* > >> Bernd Fehling Universitätsbibliothek Bielefeld > >> Dipl.-Inform. (FH) Universitätsstr. 25 > >> Tel. +49 521 106-4060 Fax. +49 521 106-4052 > >> bernd.fehl...@uni-bielefeld.de 33615 Bielefeld > >> > >> BASE - Bielefeld Academic Search Engine - www.base-search.net > >> ************************************************************* > >> > > > > -- > ************************************************************* > Bernd Fehling Universitätsbibliothek Bielefeld > Dipl.-Inform. (FH) Universitätsstr. 25 > Tel. +49 521 106-4060 Fax. +49 521 106-4052 > bernd.fehl...@uni-bielefeld.de 33615 Bielefeld > > BASE - Bielefeld Academic Search Engine - www.base-search.net > ************************************************************* >