I agree! I also share the thought experiment, these changes seem justifiable 
indeed! It is just that i am interested in the evidence.

Again, if you can webster.homer, please share significant figures. They should 
be interesting!

Regards, M. 
 
-----Original message-----
> From:Walter Underwood <wun...@wunderwood.org>
> Sent: Tuesday 8th August 2017 23:41
> To: solr-user@lucene.apache.org
> Subject: Re: Solr 6 and IDF
> 
> There are good use cases for disabling idf and even tf for labels and 
> categories.
> 
> Searching resumes, maybe you care that “microsoft word” is less selective 
> than “r programming”, but maybe you want all the ones that match three skills 
> followed by the ones that match two skills, regardless of how common those 
> skills are.
> 
> And for tf, a document tagged with both “new york” and “new york city” is not 
> twice as much about New York. Same for the movie “New York, New York”.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
> > On Aug 8, 2017, at 2:18 PM, Markus Jelsma <markus.jel...@openindex.io> 
> > wrote:
> > 
> > Do you measure MRR or sales conversion right now? It would be interesting 
> > to see the graph change after your modification, or not of course. Please 
> > let us know!
> > 
> > -----Original message-----
> >> From:Webster Homer <webster.ho...@sial.com>
> >> Sent: Tuesday 8th August 2017 23:04
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Solr 6 and IDF
> >> 
> >> I think just disabling idf is what we want. For product searching we really
> >> don't want to raise a rarer match. What we see analyzing results is that
> >> some good hits are suppressed, have lower scores, due to idf.
> >> 
> >> This is so we can test this. We think it will help, but we'll see.
> >> 
> >> On Tue, Aug 8, 2017 at 3:53 PM, Markus Jelsma <markus.jel...@openindex.io>
> >> wrote:
> >> 
> >>> Yes, extend the default Similarity, return 1.0f for idf and probably the
> >>> idfExplain methods, and configure it in your schema, global or per-field.
> >>> 
> >>> If you think this is a good idea, why not also return 1.0f for tf? And
> >>> while you're at it, also omitNorms on all fields entirely?
> >>> 
> >>> I am curious if this is going to help you, please let us know!
> >>> 
> >>> -----Original message-----
> >>>> From:Webster Homer <webster.ho...@sial.com>
> >>>> Sent: Tuesday 8th August 2017 22:44
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: Solr 6 and IDF
> >>>> 
> >>>> It appears that all I need to do is create a class that
> >>>> extends BM25Similarity, and have the new class return 1 as the idf. Is
> >>> that
> >>>> correct?
> >>>> 
> >>>> On Tue, Aug 8, 2017 at 3:15 PM, Webster Homer <webster.ho...@sial.com>
> >>>> wrote:
> >>>> 
> >>>>> I do want to use BM25, just disable IDF
> >>>>> 
> >>>>> On Tue, Aug 8, 2017 at 2:58 PM, Peter Lancaster <
> >>>>> peter.lancas...@findmypast.com> wrote:
> >>>>> 
> >>>>>> Hi Webster,
> >>>>>> 
> >>>>>> If you're not worried about using BM25 searcher then you should just
> >>> be
> >>>>>> able to continue as you were before by providing your own similarity
> >>> class
> >>>>>> that extends ClassicSimilarity and then override the idf method to
> >>> always
> >>>>>> return 1,  then reference that in your schema
> >>>>>> e.g.
> >>>>>> <similarity class="brightsolid.solr.plugins.MyCustomSimilarity" />
> >>>>>> 
> >>>>>> As far as I know you've been able to have different similarities per
> >>>>>> field in solr for a while now. https://wiki.apache.org/solr/S
> >>>>>> chemaXml#Similarity
> >>>>>> 
> >>>>>> Cheers,
> >>>>>> Peter Lancaster.
> >>>>>> 
> >>>>>> 
> >>>>>> -----Original Message-----
> >>>>>> From: Webster Homer [mailto:webster.ho...@sial.com]
> >>>>>> Sent: 08 August 2017 20:39
> >>>>>> To: solr-user@lucene.apache.org
> >>>>>> Subject: Solr 6 and IDF
> >>>>>> 
> >>>>>> Our most common use for solr is searching for products, not text
> >>> search.
> >>>>>> My company is in the process of migrating away from an Endeca search
> >>>>>> engine,  the goal to keep the business happy is to make sure that
> >>> search
> >>>>>> results from the different engines be fairly similar, one area that
> >>> we have
> >>>>>> found that suppresses a result from being as good as it was in the old
> >>>>>> system is the idf.
> >>>>>> 
> >>>>>> We are using Solr 6. After moving to it, a lot of our results got
> >>> better,
> >>>>>> but idf still seems to deaden some results. Given that our focus is
> >>> product
> >>>>>> searching I really don't see a need for idf at all. Previous to Solr
> >>> 6 you
> >>>>>> could suppress idf by providing a custom similarity class. Looking
> >>> over the
> >>>>>> newer documentation a lot of things have improved, but I'm not sure I
> >>> see a
> >>>>>> simple way to turn off idf in Solr 6's BM25 searcher.
> >>>>>> 
> >>>>>> How do I disable IDF in Solr 6?
> >>>>>> 
> >>>>>> We also do have needs for text searching so it would be nice if we
> >>> could
> >>>>>> suppress IDF on a field or schema level
> >>>>>> 
> >>>>>> --
> >>>>>> 
> >>>>>> 
> >>>>>> This message and any attachment are confidential and may be
> >>> privileged or
> >>>>>> otherwise protected from disclosure. If you are not the intended
> >>> recipient,
> >>>>>> you must not copy this message or attachment or disclose the contents
> >>> to
> >>>>>> any other person. If you have received this transmission in error,
> >>> please
> >>>>>> notify the sender immediately and delete the message and any
> >>> attachment
> >>>>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
> >>>>>> subsidiaries do not accept liability for any omissions or errors in
> >>> this
> >>>>>> message which may arise as a result of E-Mail-transmission or for
> >>> damages
> >>>>>> resulting from any unauthorized changes of the content of this
> >>> message and
> >>>>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >>>>>> subsidiaries do not guarantee that this message is free of viruses
> >>> and does
> >>>>>> not accept liability for any damages caused by any virus transmitted
> >>>>>> therewith.
> >>>>>> 
> >>>>>> Click http://www.emdgroup.com/disclaimer to access the German,
> >>> French,
> >>>>>> Spanish and Portuguese versions of this disclaimer.
> >>>>>> ________________________________
> >>>>>> 
> >>>>>> This message is confidential and may contain privileged information.
> >>> You
> >>>>>> should not disclose its contents to any other person. If you are not
> >>> the
> >>>>>> intended recipient, please notify the sender named above immediately.
> >>> It is
> >>>>>> expressly declared that this e-mail does not constitute nor form part
> >>> of a
> >>>>>> contract or unilateral obligation. Opinions, conclusions and other
> >>>>>> information in this message that do not relate to the official
> >>> business of
> >>>>>> findmypast shall be understood as neither given nor endorsed by it.
> >>>>>> ________________________________
> >>>>>> 
> >>>>>> ____________________________________________________________
> >>>>>> ______________
> >>>>>> 
> >>>>>> This email has been checked for virus and other malicious content
> >>> prior
> >>>>>> to leaving our network.
> >>>>>> ____________________________________________________________
> >>>>>> ______________
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> >>>> --
> >>>> 
> >>>> 
> >>>> This message and any attachment are confidential and may be privileged or
> >>>> otherwise protected from disclosure. If you are not the intended
> >>> recipient,
> >>>> you must not copy this message or attachment or disclose the contents to
> >>>> any other person. If you have received this transmission in error, please
> >>>> notify the sender immediately and delete the message and any attachment
> >>>> from your system. Merck KGaA, Darmstadt, Germany and any of its
> >>>> subsidiaries do not accept liability for any omissions or errors in this
> >>>> message which may arise as a result of E-Mail-transmission or for damages
> >>>> resulting from any unauthorized changes of the content of this message
> >>> and
> >>>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> >>>> subsidiaries do not guarantee that this message is free of viruses and
> >>> does
> >>>> not accept liability for any damages caused by any virus transmitted
> >>>> therewith.
> >>>> 
> >>>> Click http://www.emdgroup.com/disclaimer to access the German, French,
> >>>> Spanish and Portuguese versions of this disclaimer.
> >>>> 
> >>> 
> >> 
> >> -- 
> >> 
> >> 
> >> This message and any attachment are confidential and may be privileged or 
> >> otherwise protected from disclosure. If you are not the intended 
> >> recipient, 
> >> you must not copy this message or attachment or disclose the contents to 
> >> any other person. If you have received this transmission in error, please 
> >> notify the sender immediately and delete the message and any attachment 
> >> from your system. Merck KGaA, Darmstadt, Germany and any of its 
> >> subsidiaries do not accept liability for any omissions or errors in this 
> >> message which may arise as a result of E-Mail-transmission or for damages 
> >> resulting from any unauthorized changes of the content of this message and 
> >> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
> >> subsidiaries do not guarantee that this message is free of viruses and 
> >> does 
> >> not accept liability for any damages caused by any virus transmitted 
> >> therewith.
> >> 
> >> Click http://www.emdgroup.com/disclaimer to access the German, French, 
> >> Spanish and Portuguese versions of this disclaimer.
> >> 
> 
> 

Reply via email to