In our case it's the opposite. For our clients it is very important that every 
synonym gets equal chances in the relevancy calculation. The fact that "nol" 
scores higher than "net operating loss", simply because its document frequency 
is lower, is unacceptable and a reason to look for ways to disable the IDF from 
the score calculation. But that is in fact something I don't like to do since 
IDF is such an elementary part of the algorithm (and very useful for 
non-synonym searches).

Pre-processing synonyms to apply 'reverse weighting' is also a strategy to 
consider but I agree with Walter that this very error-prone, things could get 
easily out of sync. Moreover, none of our Dev-, QA-, STG-, PRD- environment 
contain exactly the same content, so it would require different tuned synonyms 
dictionary for each of them...meh...

In our previous search engine (FAST ESP) we basically switched off IDF, but I 
am still a bit hoping that there is a more sophisticated solution with Solr.


-----Original Message-----
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Thursday 13 December 2012 02:30
To: solr-user@lucene.apache.org
Subject: Re: Can a field with defined synonym be searched without the synonym?

All of the applications I've seen with user control over synonym expansion 
where recall-oriented. The "give me all matches for X" kind of problem. So 
ranking is not as important.

wunder

On Dec 12, 2012, at 5:23 PM, Roman Chyla wrote:

> Well, this IDF problem has more sides. So, let's say your synonym file
> contains multi-token synonyms (it does, right? or perhaps you don't need
> it? well, some people do)
>
> "TV, TV set, TV foo, television"
>
> if you use the default synonym expansion, when you index 'television'
>
> you have increased frequency of also 'set', 'foo', so, the IDF of 'TV' is
> the same as that of 'television' - but IDF of 'foo' and 'set' has changed
> (their frequency increased, their IDF decreased) -- TV's have in fact made
> 'foo' term very frequent and undesirable
>
> So, you might be sure that IDF of 'TV' and 'television' are the same, but
> you are not aware it has 'screwed' other (desirable) terms - so it really
> depends. And I wouldn't argue these cases are esoteric.
>
> And finally: there are use cases out there, where people NEED to switch off
> synonym expansion at will (find only these documents, that contain the word
> 'TV' and not that bloody 'foo'). This cannot be done if the index contains
> all synonym terms (unless you have a way to mark the original and the
> synonym in the index).
>
> roman
>
>
> On Wed, Dec 12, 2012 at 12:50 PM, Walter Underwood 
> <wun...@wunderwood.org>wrote:
>
>> Query parsers cannot fix the IDF problem or make query-time synonyms
>> faster. Query synonym expansion makes more search terms. More search terms
>> are more work at query time.
>>
>> The IDF problem is real; I've run up against it. The most rare variant of
>> the synonym have the highest score. This probably the opposite of what you
>> want. For me, it was "TV" and "television". Documents with "TV" had higher
>> scores than those with "television".
>>
>> wunder
>>
>> On Dec 12, 2012, at 9:45 AM, Roman Chyla wrote:
>>
>>> @wunder
>>> It is a misconception (well, supported by that wiki description) that the
>>> query time synonym filter have these problems. It is actually the default
>>> parser, that is causing these problems. Look at this if you still think
>>> that index time synonyms are cure for all:
>>> https://issues.apache.org/jira/browse/LUCENE-4499
>>>
>>> @joe
>>> If you can use the flexible query parser (as linked in by @Swati) then
>> all
>>> you need to do is to define a different field with a different tokenizer
>>> chain and then swap the field names before the analyzers processes the
>>> document (and then rewrite the field name back - for example, we have
>>> fields called "author" and "author_nosyn")
>>>
>>> roman
>>>
>>> On Wed, Dec 12, 2012 at 12:38 PM, Walter Underwood <
>> wun...@wunderwood.org>wrote:
>>>
>>>> Query time synonyms have known problems. They are slower, cause
>> incorrect
>>>> IDF, and don't work for phrase synonyms.
>>>>
>>>> Apply synonyms at index time and you will have none of those problems.
>>>>
>>>> See:
>>>>
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
>>>>
>>>> wunder
>>>>
>>>> On Dec 12, 2012, at 9:34 AM, Swati Swoboda wrote:
>>>>
>>>>> Query-time analyzers are still applied, even if you include a string in
>>>> quotes. Would you expect "foo" to not match "Foo" just because it's
>>>> enclosed in quotes?
>>>>>
>>>>> Also look at this, someone who had similar requirements:
>>>>>
>>>>
>> http://lucene.472066.n3.nabble.com/Synonym-Filter-disable-at-query-time-td2919876.html
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: joe.cohe...@gmail.com [mailto:joe.cohe...@gmail.com]
>>>>> Sent: Wednesday, December 12, 2012 12:09 PM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: Can a field with defined synonym be searched without the
>>>> synonym?
>>>>>
>>>>>
>>>>> I'm aplying only query-time synonym, so I have the original values
>>>> stored and indexed.
>>>>> I would've expected that if I search a strin with quotations, i'll get
>>>> the exact match, without applying a synonym.
>>>>>
>>>>> any way to achieve that?
>>>>>
>>>>>
>>>>> Upayavira wrote
>>>>>> You can only search against terms that are stored in your index. If
>>>>>> you have applied index time synonyms, you can't remove them at query
>>>> time.
>>>>>>
>>>>>> You can, however, use copyField to clone an incoming field to another
>>>>>> field that doesn't use synonyms, and search against that field
>> instead.
>>>>>>
>>>>>> Upayavira
>>>>>>
>>>>>> On Wed, Dec 12, 2012, at 04:26 PM,
>>>>>
>>>>>> joe.cohen.m@
>>>>>
>>>>>> wrote:
>>>>>>> Hi
>>>>>>> I hava a field type without defined synonym.txt which retrieves both
>>>>>>> records with "home" and "house" when I search either one of them.
>>>>>>>
>>>>>>> I want to be able to search this field on the specific value that I
>>>>>>> enter, without the synonym filter.
>>>>>>>
>>>>>>> is it possible?
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>>
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-b
>>>>>>> e-searched-without-the-synonym-tp4026381.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>
>> http://lucene.472066.n3.nabble.com/Can-a-field-with-defined-synonym-be-searched-without-the-synonym-tp4026381p4026405.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>> --
>>>> Walter Underwood
>>>> wun...@wunderwood.org
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> Walter Underwood
>> wun...@wunderwood.org
>>
>>
>>
>>

--
Walter Underwood
wun...@wunderwood.org




This email and any attachments may contain confidential or privileged 
information
and is intended for the addressee only. If you are not the intended recipient, 
please
immediately notify us by email or telephone and delete the original email and 
attachments
without using, disseminating or reproducing its contents to anyone other than 
the intended
recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete 
transmission of
of this email or any attachments, nor for unauthorized use by its employees.

Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The 
Netherlands, and is registered
with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.

Reply via email to