Thanks Walter. Much appreciated. To the Solr dev team, it would be of great help if there Walter's IDF summary is made part of stop-filter: https://lucene.apache.org/solr/guide/8_5/filter-descriptions.html#stop-filter
Steve On Fri, Apr 24, 2020 at 8:49 PM Walter Underwood <wun...@wunderwood.org> wrote: > IDF and stopword removal are different approaches to the same thing. > > Removing stopwords is a binary decision on how important common words > are for search. It says some words are completely useless. > > IDF is a proportional measure on how important common words are for search. > > Instead of removing a list of words that are assumed to be common and less > useful, let the engine actually measure how common the words are and factor > that into the relevance. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Apr 24, 2020, at 5:39 PM, Steven White <swhite4...@gmail.com> wrote: > > > > Hi everyone, > > > > I get it why and when if stopwords are note indexed is a bad idea and can > > give you 0 or incomplete results. But what about the quality of search > > result when stopwords are indexed vs. not indexed? > > > > 1) Stopwords are removed and I do word search, not phrase for "solr and > > lucene are so cool". > > 2) Stopwords are not removed and I do word search, not phrase for "solr > and > > lucene are so cool". > > > > Now if "and", "are" and "or" are stopwords, will the search quality and > > ranking for #1 be better then #2? What about if I turn the above into a > > phrase search? > > > > Thanks > > > > Steve > > > > > > On Fri, Apr 24, 2020 at 10:53 AM Walter Underwood <wun...@wunderwood.org > > > > wrote: > > > >> I’m astonished that the default still has that. It was a bad idea in > Solr > >> 1.3, when > >> it bit my ass. > >> > >> We help people with this about once a month and the advice is always the > >> same. > >> Imagine all the poor people who never ask about it and run with that > >> default! > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On Apr 24, 2020, at 7:34 AM, Erick Erickson <erickerick...@gmail.com> > >> wrote: > >>> > >>> +1 to removing stopword filters. > >>> > >>>> On Apr 24, 2020, at 10:28 AM, Jan Høydahl <jan....@cominvent.com> > >> wrote: > >>>> > >>>> I tend to agree. Should we simply remove the stopword filters from the > >> default configsets shipping with Solr? > >>>> > >>>> Jan > >>>> > >>>>> 24. apr. 2020 kl. 14:44 skrev David Hastings < > >> hastings.recurs...@gmail.com>: > >>>>> > >>>>> you should never use the stopword filter unless you have a very > >> specific > >>>>> purpose > >>>>> > >>>>> On Fri, Apr 24, 2020 at 8:33 AM Steven White <swhite4...@gmail.com> > >> wrote: > >>>>> > >>>>>> Hi everyone, > >>>>>> > >>>>>> What is, if any, the impact of stopwords in to my search ranking > >> quality? > >>>>>> Will my ranking improve is I do not index stopwords? > >>>>>> > >>>>>> I'm trying to figure out if I should use the stopword filter or not. > >>>>>> > >>>>>> Thanks in advanced. > >>>>>> > >>>>>> Steve > >>>>>> > >>>> > >>> > >> > >> > >