Hi!

Ah, it makes sense now! This global configured similarity knows returns a 
fieldType defined similarity if available and if not the standard Lucene 
similarity. This would, i assume, mean that the two defined similarities below 
without per fieldType declared similarities would always yield the same results?

<similarity class="org.apache.lucene.search.similarities.DefaultSimilarity"/>
<similarity class="solr.SchemaSimilarityFactory"/>

I would assume because without per fieldType declared the 
SchemaSimilarityFactory returns the default lucene Similarity. However, when 
checking out it doesn't work for my url field but does work for the content and 
title field. I have defined the same similarity for the url fieldType as i did 
for the title fieldType. This is the output for solr.SchemaSimilarityFactory 
without per-field declared: 

  38.565483 = (MATCH) max plus 0.27 times others of:
    5.434552 = (MATCH) weight(content:groning^1.4 in 384) [], result of:
      5.434552 = score(doc=384,freq=10.0 = termFreq=10.0
), product of:
        1.5511217 = queryWeight, product of:
          1.4 = boost
          1.1079441 = idf(docFreq=1236, maxDocs=1378)
          1.0 = queryNorm
        3.503627 = fieldWeight in 384, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.1079441 = idf(docFreq=1236, maxDocs=1378)
          1.0 = fieldNorm(doc=384)
    4.300008 = (MATCH) weight(title:groning^4.7 in 384) [], result of:
      4.300008 = score(doc=384,freq=2.0 = termFreq=2.0
), product of:
        5.346149 = queryWeight, product of:
          4.7 = boost
          1.1374786 = idf(docFreq=1200, maxDocs=1378)
          1.0 = queryNorm
        0.8043188 = fieldWeight in 384, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.1374786 = idf(docFreq=1200, maxDocs=1378)
          0.5 = fieldNorm(doc=384)
    35.937153 = (MATCH) weight(url:groning^2.1 in 384) [], result of:
      35.937153 = score(doc=384,freq=1.0 = termFreq=1.0
), product of:
        10.988577 = queryWeight, product of:
          2.1 = boost
          5.232656 = idf(docFreq=19, maxDocs=1378)
          1.0 = queryNorm
        3.27041 = fieldWeight in 384, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          5.232656 = idf(docFreq=19, maxDocs=1378)
          0.625 = fieldNorm(doc=384)


Here's the output with DefaultSimilarity declared:

  3.2723136 = (MATCH) max plus 0.27 times others of:
    0.46112633 = (MATCH) weight(content:groning^1.4 in 327) 
[DefaultSimilarity], result of:
      0.46112633 = score(doc=327,freq=10.0 = termFreq=10.0
), product of:
        0.13161398 = queryWeight, product of:
          1.4 = boost
          1.1079441 = idf(docFreq=1236, maxDocs=1378)
          0.08485084 = queryNorm
        3.503627 = fieldWeight in 327, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.1079441 = idf(docFreq=1236, maxDocs=1378)
          1.0 = fieldNorm(doc=327)
    0.36485928 = (MATCH) weight(title:groning^4.7 in 327) [DefaultSimilarity], 
result of:
      0.36485928 = score(doc=327,freq=2.0 = termFreq=2.0
), product of:
        0.45362523 = queryWeight, product of:
          4.7 = boost
          1.1374786 = idf(docFreq=1200, maxDocs=1378)
          0.08485084 = queryNorm
        0.8043188 = fieldWeight in 327, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.1374786 = idf(docFreq=1200, maxDocs=1378)
          0.5 = fieldNorm(doc=327)
    3.0492976 = (MATCH) weight(url:groning^2.1 in 327) [DefaultSimilarity], 
result of:It also seems the debug output is wrong, it does not write the 
similarity classname between [] and produces an empty [] for each match.
      3.0492976 = score(doc=327,freq=1.0 = termFreq=1.0
), product of:
        0.93239 = queryWeight, product of:
          2.1 = boost
          5.232656 = idf(docFreq=19, maxDocs=1378)
          0.08485084 = queryNorm
        3.27041 = fieldWeight in 327, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          5.232656 = idf(docFreq=19, maxDocs=1378)
          0.625 = fieldNorm(doc=327)

How can i explain the difference? Also, with the factory declared, the score of 
the url field is still the same, it does not seem to listen to the per-field 
declared similarity. It also seems the debug output is wrong, it does not write 
the similarity classname between [] and produces an empty [] for each match.

Many thanks and a nice weekend!
Markus
 
 
-----Original message-----
> From:Robert Muir <rcm...@gmail.com>
> Sent: Fri 01-Jun-2012 17:00
> To: solr-user@lucene.apache.org
> Subject: Re: per-fieldtype similarity not working
> 
> On Fri, Jun 1, 2012 at 5:13 AM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > Thanks but i am clearly missing something? We declare the similarity in the 
> > fieldType just as in the example and looking at the example again i don't 
> > see how it's being done differently. What am i missnig and where do i miss 
> > it? :)
> >
> 
> Hi Markus, checkout the last line at the bottom:
>  <!-- default similarity, defers to the fieldType -->
>  <similarity class="solr.SchemaSimilarityFactory"/>
> 
> When this is set, it means IndexSearcher/IndexWriter use a
> PerFieldSimilarityWrapper that delegates based to the Solr schema
> fieldtype.
> 
> Note this is just a simple ordinary similarity impl
> (http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/similarities/SchemaSimilarityFactory.java),
> you could also write your own that works differently.
> 
> -- 
> lucidimagination.com
> 

Reply via email to