There are good use cases for disabling idf and even tf for labels and 
categories.

Searching resumes, maybe you care that “microsoft word” is less selective than 
“r programming”, but maybe you want all the ones that match three skills 
followed by the ones that match two skills, regardless of how common those 
skills are.

And for tf, a document tagged with both “new york” and “new york city” is not 
twice as much about New York. Same for the movie “New York, New York”.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 8, 2017, at 2:18 PM, Markus Jelsma <markus.jel...@openindex.io> wrote:
> 
> Do you measure MRR or sales conversion right now? It would be interesting to 
> see the graph change after your modification, or not of course. Please let us 
> know!
> 
> -----Original message-----
>> From:Webster Homer <webster.ho...@sial.com>
>> Sent: Tuesday 8th August 2017 23:04
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr 6 and IDF
>> 
>> I think just disabling idf is what we want. For product searching we really
>> don't want to raise a rarer match. What we see analyzing results is that
>> some good hits are suppressed, have lower scores, due to idf.
>> 
>> This is so we can test this. We think it will help, but we'll see.
>> 
>> On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io>
>> wrote:
>> 
>>> Yes, extend the default Similarity, return 1.0f for idf and probably the
>>> idfExplain methods, and configure it in your schema, global or per-field.
>>> 
>>> If you think this is a good idea, why not also return 1.0f for tf? And
>>> while you're at it, also omitNorms on all fields entirely?
>>> 
>>> I am curious if this is going to help you, please let us know!
>>> 
>>> -----Original message-----
>>>> From:Webster Homer <webster.ho...@sial.com>
>>>> Sent: Tuesday 8th August 2017 22:44
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Solr 6 and IDF
>>>> 
>>>> It appears that all I need to do is create a class that
>>>> extends BM25Similarity, and have the new class return 1 as the idf. Is
>>> that
>>>> correct?
>>>> 
>>>> On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
>>>> wrote:
>>>> 
>>>>> I do want to use BM25, just disable IDF
>>>>> 
>>>>> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
>>>>> peter.lancas...@findmypast.com> wrote:
>>>>> 
>>>>>> Hi Webster,
>>>>>> 
>>>>>> If you're not worried about using BM25 searcher then you should just
>>> be
>>>>>> able to continue as you were before by providing your own similarity
>>> class
>>>>>> that extends ClassicSimilarity and then override the idf method to
>>> always
>>>>>> return 1,  then reference that in your schema
>>>>>> e.g.
>>>>>> <similarity class="brightsolid.solr.plugins.MyCustomSimilarity" />
>>>>>> 
>>>>>> As far as I know you've been able to have different similarities per
>>>>>> field in solr for a while now. https://wiki.apache.org/solr/S
>>>>>> chemaXml#Similarity
>>>>>> 
>>>>>> Cheers,
>>>>>> Peter Lancaster.
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Webster Homer [mailto:webster.ho...@sial.com]
>>>>>> Sent: 08 August 2017 20:39
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Solr 6 and IDF
>>>>>> 
>>>>>> Our most common use for solr is searching for products, not text
>>> search.
>>>>>> My company is in the process of migrating away from an Endeca search
>>>>>> engine,  the goal to keep the business happy is to make sure that
>>> search
>>>>>> results from the different engines be fairly similar, one area that
>>> we have
>>>>>> found that suppresses a result from being as good as it was in the old
>>>>>> system is the idf.
>>>>>> 
>>>>>> We are using Solr 6. After moving to it, a lot of our results got
>>> better,
>>>>>> but idf still seems to deaden some results. Given that our focus is
>>> product
>>>>>> searching I really don't see a need for idf at all. Previous to Solr
>>> 6 you
>>>>>> could suppress idf by providing a custom similarity class. Looking
>>> over the
>>>>>> newer documentation a lot of things have improved, but I'm not sure I
>>> see a
>>>>>> simple way to turn off idf in Solr 6's BM25 searcher.
>>>>>> 
>>>>>> How do I disable IDF in Solr 6?
>>>>>> 
>>>>>> We also do have needs for text searching so it would be nice if we
>>> could
>>>>>> suppress IDF on a field or schema level
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> This message and any attachment are confidential and may be
>>> privileged or
>>>>>> otherwise protected from disclosure. If you are not the intended
>>> recipient,
>>>>>> you must not copy this message or attachment or disclose the contents
>>> to
>>>>>> any other person. If you have received this transmission in error,
>>> please
>>>>>> notify the sender immediately and delete the message and any
>>> attachment
>>>>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>>>>> subsidiaries do not accept liability for any omissions or errors in
>>> this
>>>>>> message which may arise as a result of E-Mail-transmission or for
>>> damages
>>>>>> resulting from any unauthorized changes of the content of this
>>> message and
>>>>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>>>>> subsidiaries do not guarantee that this message is free of viruses
>>> and does
>>>>>> not accept liability for any damages caused by any virus transmitted
>>>>>> therewith.
>>>>>> 
>>>>>> Click http://www.emdgroup.com/disclaimer to access the German,
>>> French,
>>>>>> Spanish and Portuguese versions of this disclaimer.
>>>>>> ________________________________
>>>>>> 
>>>>>> This message is confidential and may contain privileged information.
>>> You
>>>>>> should not disclose its contents to any other person. If you are not
>>> the
>>>>>> intended recipient, please notify the sender named above immediately.
>>> It is
>>>>>> expressly declared that this e-mail does not constitute nor form part
>>> of a
>>>>>> contract or unilateral obligation. Opinions, conclusions and other
>>>>>> information in this message that do not relate to the official
>>> business of
>>>>>> findmypast shall be understood as neither given nor endorsed by it.
>>>>>> ________________________________
>>>>>> 
>>>>>> ____________________________________________________________
>>>>>> ______________
>>>>>> 
>>>>>> This email has been checked for virus and other malicious content
>>> prior
>>>>>> to leaving our network.
>>>>>> ____________________________________________________________
>>>>>> ______________
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> This message and any attachment are confidential and may be privileged or
>>>> otherwise protected from disclosure. If you are not the intended
>>> recipient,
>>>> you must not copy this message or attachment or disclose the contents to
>>>> any other person. If you have received this transmission in error, please
>>>> notify the sender immediately and delete the message and any attachment
>>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
>>>> subsidiaries do not accept liability for any omissions or errors in this
>>>> message which may arise as a result of E-Mail-transmission or for damages
>>>> resulting from any unauthorized changes of the content of this message
>>> and
>>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>>>> subsidiaries do not guarantee that this message is free of viruses and
>>> does
>>>> not accept liability for any damages caused by any virus transmitted
>>>> therewith.
>>>> 
>>>> Click http://www.emdgroup.com/disclaimer to access the German, French,
>>>> Spanish and Portuguese versions of this disclaimer.
>>>> 
>>> 
>> 
>> -- 
>> 
>> 
>> This message and any attachment are confidential and may be privileged or 
>> otherwise protected from disclosure. If you are not the intended recipient, 
>> you must not copy this message or attachment or disclose the contents to 
>> any other person. If you have received this transmission in error, please 
>> notify the sender immediately and delete the message and any attachment 
>> from your system. Merck KGaA, Darmstadt, Germany and any of its 
>> subsidiaries do not accept liability for any omissions or errors in this 
>> message which may arise as a result of E-Mail-transmission or for damages 
>> resulting from any unauthorized changes of the content of this message and 
>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
>> subsidiaries do not guarantee that this message is free of viruses and does 
>> not accept liability for any damages caused by any virus transmitted 
>> therewith.
>> 
>> Click http://www.emdgroup.com/disclaimer to access the German, French, 
>> Spanish and Portuguese versions of this disclaimer.
>> 

Reply via email to