Robert Muir created LUCENE-6789:
-----------------------------------

             Summary: change IndexSearcher default similarity to BM25
                 Key: LUCENE-6789
                 URL: https://issues.apache.org/jira/browse/LUCENE-6789
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Robert Muir
             Fix For: 6.0


Since Lucene 4.0, the statistics needed for this are always present, so we can 
make the change without any degradation.

I think the change should be a 6.0 change only: it will prevent any surprises. 
DefaultSimilarity is renamed to ClassicSimilarity to prevent confusion. No 
indexing change is needed as we use the same norm format, its just a runtime 
switch. Users can just do IndexSearcher.setSimilarity(new ClassicSimilarity()) 
to get the old behavior.  I did not change solr's default here, I think that 
should be a separate issue, since it has more concerns: e.g. factories in 
configuration files and so on.

One issue was the generation of synonym queries (posinc=0) by QueryBuilder 
(used by parsers). This is kind of a corner case (query-time synonyms), but we 
should make it nicer. The current code in trunk disables coord, which makes no 
sense for anything but the vector space impl. Instead, this patch adds a 
SynonymQuery which treats occurrences of any term as a single pseudoterm. With 
english wordnet as a query-time synonym dict, this query gives 12% improvement 
in MAP for title queries on BM25, and 2% with Classic (not significant). So its 
a better generic approach for synonyms that works with all scoring models.

I wanted to use BlendedTermQuery, but it seems to have problems at a glance, it 
tries to "take on the world", it has problems like not working with distributed 
scoring (doesn't consult indexsearcher for stats). Anyway this one is a 
different, simpler approach, which only works for a single field, and which 
calls tf(sum) a single time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to