Re: Multi-words synonyms matching

Erick Erickson Tue, 24 Apr 2012 08:24:21 -0700

Elisabeth:

What shows up in the debug section of the response when you add
&debugQuery=on? There should be some bit of that section like:
"parsed_filter_queries"


My other question is "are you absolutely sure that your
CATEGORY_ANALYZED field has the correct content?". How does it
get populated?

Nothing jumps out at me here....

Best
Erick

On Tue, Apr 24, 2012 at 9:55 AM, elisabeth benoit
<elisaelisael...@gmail.com> wrote:
> yes, thanks, but this is NOT my question.
>
> I was wondering why I have multiple matches with q="hotel de ville" and no
> match with fq=CATEGORY_ANALYZED:"hotel de ville", since in both case I'm
> searching in the same solr fieldType.
>
> Why is q parameter behaving differently in that case? Why do the quotes
> work in one case and not in the other?
>
> Does anyone know?
>
> Thanks,
> Elisabeth
>
> 2012/4/24 Jeevanandam <je...@myjeeva.com>
>
>>
>> usage of q and fq
>>
>> q => is typically the main query for the search request
>>
>> fq => is Filter Query; generally used to restrict the super set of
>> documents without influencing score (more info.
>> http://wiki.apache.org/solr/**CommonQueryParameters#q<http://wiki.apache.org/solr/CommonQueryParameters#q>
>> )
>>
>> For example:
>> ------------
>> q="hotel de ville" ===> returns 100 documents
>>
>> q="hotel de ville"&fq=price:[100 To *]&fq=roomType:"King size Bed" ===>
>> returns 40 documents from super set of 100 documents
>>
>>
>> hope this helps!
>>
>> - Jeevanandam
>>
>>
>>
>> On 24-04-2012 3:08 pm, elisabeth benoit wrote:
>>
>>> Hello,
>>>
>>> I'd like to resume this post.
>>>
>>> The only way I found to do not split synonyms in words in synonyms.txt it
>>> to use the line
>>>
>>>  <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="true"
>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>
>>>
>>> in schema.xml
>>>
>>> where tokenizerFactory="solr.**KeywordTokenizerFactory"
>>>
>>> instructs SynonymFilterFactory not to break synonyms into words on white
>>> spaces when parsing synonyms file.
>>>
>>> So now it works fine, "mairie" is mapped into "hotel de ville" and when I
>>> send request q="hotel de ville" (quotes are mandatory to prevent analyzer
>>> to split hotel de ville on white spaces), I get answers with word
>>> "mairie".
>>>
>>> But when I use fq parameter (fq=CATEGORY_ANALYZED:"hotel de ville"), it
>>> doesn't work!!!
>>>
>>> CATEGORY_ANALYZED is same field type as default search field. This means
>>> that when I send q="hotel de ville" and fq=CATEGORY_ANALYZED:"hotel de
>>> ville", solr uses the same analyzer, the one with the line
>>>
>>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
>>> ignoreCase="true" expand="true"
>>> tokenizerFactory="solr.**KeywordTokenizerFactory"/>.
>>>
>>> Anyone as a clue what is different between q analysis behaviour and fq
>>> analysis behaviour?
>>>
>>> Thanks a lot
>>> Elisabeth
>>>
>>> 2012/4/12 elisabeth benoit <elisaelisael...@gmail.com>
>>>
>>>  oh, that's right.
>>>>
>>>> thanks a lot,
>>>> Elisabeth
>>>>
>>>>
>>>> 2012/4/11 Jeevanandam Madanagopal <je...@myjeeva.com>
>>>>
>>>>  Elisabeth -
>>>>>
>>>>> As you described, below mapping might suit for your need.
>>>>> mairie => hotel de ville, mairie
>>>>>
>>>>> mairie gets expanded to "hotel de ville" and "mairie" at index time.  So
>>>>> "mairie" and "hotel de ville" searchable on document.
>>>>>
>>>>> However, still white space tokenizer splits at query time will be a
>>>>> problem as described by Markus.
>>>>>
>>>>> --Jeevanandam
>>>>>
>>>>> On Apr 11, 2012, at 12:30 PM, elisabeth benoit wrote:
>>>>>
>>>>> > <<Have you tried the "=>' mapping instead? Something
>>>>> > <<like
>>>>> > <<hotel de ville => mairie
>>>>> > <<might work for you.
>>>>> >
>>>>> > Yes, thanks, I've tried it but from what I undestand it doesn't solve
>>>>> my
>>>>> > problem, since this means hotel de ville will be replace by mairie at
>>>>> > index time (I use synonyms only at index time). So when user will ask
>>>>> > "hôtel de ville", it won't match.
>>>>> >
>>>>> > In fact, at index time I have mairie in my data, but I want user to be
>>>>> able
>>>>> > to request "mairie" or "hôtel de ville" and have mairie as answer, and
>>>>> not
>>>>> > have mairie as an answer when requesting "hôtel".
>>>>> >
>>>>> >
>>>>> > <<To map `mairie` to `hotel de ville` as single token you must escape
>>>>> your
>>>>> > white
>>>>> > <<space.
>>>>> >
>>>>> > <<mairie, hotel\ de\ ville
>>>>> >
>>>>> > <<This results in  a problem if your tokenizer splits on white space
>>>>> at
>>>>> > query
>>>>> > <<time.
>>>>> >
>>>>> > Ok, I guess this means I have a problem. No simple solution since at
>>>>> query
>>>>> > time my tokenizer do split on white spaces.
>>>>> >
>>>>> > I guess my problem is more or less one of the problems discussed in
>>>>> >
>>>>> >
>>>>>
>>>>> http://lucene.472066.n3.**nabble.com/Multi-word-**
>>>>> synonyms-td3716292.html#**a3717215<http://lucene.472066.n3.nabble.com/Multi-word-synonyms-td3716292.html#a3717215>
>>>>> >
>>>>> >
>>>>> > Thanks a lot for your answers,
>>>>> > Elisabeth
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > 2012/4/10 Erick Erickson <erickerick...@gmail.com>
>>>>> >
>>>>> >> Have you tried the "=>' mapping instead? Something
>>>>> >> like
>>>>> >> hotel de ville => mairie
>>>>> >> might work for you.
>>>>> >>
>>>>> >> Best
>>>>> >> Erick
>>>>> >>
>>>>> >> On Tue, Apr 10, 2012 at 1:41 AM, elisabeth benoit
>>>>> >> <elisaelisael...@gmail.com> wrote:
>>>>> >>> Hello,
>>>>> >>>
>>>>> >>> I've read several post on this issue, but can't find a real solution
>>>>> to
>>>>> >> my
>>>>> >>> multi-words synonyms matching problem.
>>>>> >>>
>>>>> >>> I have in my synonyms.txt an entry like
>>>>> >>>
>>>>> >>> mairie, hotel de ville
>>>>> >>>
>>>>> >>> and my index time analyzer is configured as followed for synonyms.
>>>>> >>>
>>>>> >>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
>>>>> >>> ignoreCase="true" expand="true"/>
>>>>> >>>
>>>>> >>> The problem I have is that now "mairie" matches with "hotel" and I
>>>>> would
>>>>> >>> only want "mairie" to match with "hotel de ville" and "mairie".
>>>>> >>>
>>>>> >>> When I look into the analyzer, I see that "mairie" is mapped into
>>>>> >> "hotel",
>>>>> >>> and words "de ville" are added in second and third position. To
>>>>> change
>>>>> >>> that, I tried to do
>>>>> >>>
>>>>> >>> <filter class="solr.**SynonymFilterFactory" synonyms="synonyms.txt"
>>>>> >>> ignoreCase="true" expand="true"
>>>>> >>> tokenizerFactory="solr.**KeywordTokenizerFactory"/> (as I read in
>>>>> one
>>>>> post)
>>>>> >>>
>>>>> >>> and I can see now in the analyzer that "mairie" is mapped to "hotel
>>>>> de
>>>>> >>> ville", but now when I have query "hotel de ville", it doesn't match
>>>>> at
>>>>> >> all
>>>>> >>> with "mairie".
>>>>> >>>
>>>>> >>> Anyone has a clue of what I'm doing wrong?
>>>>> >>>
>>>>> >>> I'm using Solr 3.4.
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Elisabeth
>>>>> >>
>>>>>
>>>>>
>>>>>
>>>>
>>

Re: Multi-words synonyms matching

Reply via email to