Scoring, "numDocs" should be number after applying filters, not entire index
----------------------------------------------------------------------------

                 Key: SOLR-1158
                 URL: https://issues.apache.org/jira/browse/SOLR-1158
             Project: Solr
          Issue Type: Improvement
          Components: search
    Affects Versions: 1.4
            Reporter: David Smiley
            Priority: Minor


I'd like to put different types of things to search for in my Solr index.  I 
use a "type" field to discriminate between these types of things, and my "id" 
primary key field incorporates the type (ex: "FooType:53") to ensure 
uniqueness.  A problem I see with this approach is that the idf (inverse 
document frequency) component of the score is based on the entire index and not 
the type that I'm querying.  In particular "numDocs" given to the 
Similarity.java implementation is the total number of documents in the index.  
I think it would be more accurate for numDocs to be the filtered number of 
docs.  That is the number of docs after the filter queries are applied.

The only issue I see with this which may or may not be a problem is that the 
scores (and thus potentially result ordering if sorting by score)  would change 
depending on which filters are applied.  That could be counter-intuitive in a 
faceting UI.  Perhaps only a certain filter or filters could be marked as 
lowering numDocs for scoring.  Such a configuration choice strikes me as 
belonging in the schema.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to