Just a piece of feedback from clients on the original docCount change.

I have seen several cases with clients where the switch to docCount
surprised and harmed  relevance.

More broadly, I’m concerned when we make these changes there’s not a
testing process against test corpuses with judgments and relevance metrics
to understand their impact. I see it mentioned in a JIRA from time to time
that someone saw an improvement on a private collection in NDCG. And we
have to take their word for it.

Public testing of relevance against every build using stock settings could
be extremely valuable and would more easily justify these changes.
Something similar to the performance tests that are made.

Sadly I can only complain now :) I wish I had time to work on something
like this.

Doug

On Tue, Dec 5, 2017 at 7:38 AM Yonik Seeley <ysee...@gmail.com> wrote:

> On Tue, Dec 5, 2017 at 5:15 AM, alessandro.benedetti
> <a.benede...@sease.io> wrote:
> > "Lucene/Solr doesn't actually delete documents when you delete them, it
> > just marks them as deleted.  I'm pretty sure that the difference between
> > docCount and maxDoc is deleted documents.  Maybe I don't understand what
> > I'm talking about, but that is the best I can come up with. "
> >
> > Thanks Shawn, yes, that is correct and I was aware of it.
> > I was curious of another difference :
> > I think we confirmed that docCount is local to the field ( thanks Yonik
> for
> > that) so :
> >
> > docCount(index,field1)= # of documents in the index that currently have
> > value(s) for field1
> >
> > My question is :
> >
> > maxDocs(index,field1)= max # of documents in the index that had value(s)
> for
> > field1
> >
> > OR
> >
> > maxDocs(index)= max # of documents that appeared in the index ( field
> > independent)
>
> The latter.
> I imagine that's why docCount was introduced (to avoid changing the
> meaning of an existing term).
> FWIW, the scoring change was made in
> https://issues.apache.org/jira/browse/LUCENE-6711 for Lucene/Solr 6.0
>
> -Yonik
>
-- 
Consultant, OpenSource Connections. Contact info at
http://o19s.com/about-us/doug-turnbull/; Free/Busy (http://bit.ly/dougs_cal)

Reply via email to