RE: Strange relevance scoring

2014-04-08 Thread Markus Jelsma
Hi - the thing you describe is possible when your set up uses SpanFirstQuery. But to be sure what's going on you should post the debug output. -Original message- From:John Nielsen j...@mcb.dk Sent: Tuesday 8th April 2014 11:03 To: solr-user@lucene.apache.org Subject: Strange

Re: Strange relevance scoring

2014-04-08 Thread Ahmet Arslan
Hi Nielsen, There is no special attention paid to first word. You are probably hitting length normalisation.  Lucene/Solr punishes long documents, favours short documents.  (5 times appearing one) longer? On Tuesday, April 8, 2014 12:03 PM, John Nielsen j...@mcb.dk wrote: Hi, We are seeing a

Re: Strange relevance scoring

2014-04-08 Thread John Nielsen
Interesting. Most of the text fields are single word fields or close to it, but on some of the documents, long text appears. How long does a text need to be before hitting length normalization? On Tue, Apr 8, 2014 at 11:36 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Nielsen, There is no

Re: Strange relevance scoring

2014-04-08 Thread John Nielsen
Hi, I couldn't find any occurrence of SpanFirstQuery in either the schema.xml or solrconfig.xml files. This is the query i used with debug=results. http://pastebin.com/bWzUkjKz And here is the answer. http://pastebin.com/nCXFcuky I am not sure what I am supposed to be looking for. On Tue,

Re: Strange relevance scoring

2014-04-08 Thread Ahmet Arslan
Hi, length normal is computed for every document at index time. I think it is 1/sqrt(number of terms). Please see section 6. norm(t,d) at https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html If you don't care about length normalisation, you can

Re: Strange relevance scoring

2014-04-08 Thread David Santamauro
Is there any general setting that removes this punishment or must omitNorms=false be part of every field definition? On 4/8/2014 7:04 AM, Ahmet Arslan wrote: Hi, length normal is computed for every document at index time. I think it is 1/sqrt(number of terms). Please see section 6.

Re: Strange relevance scoring

2014-04-08 Thread Ahmet Arslan
Hi David, omitNorms=true will cause additional performance gains too.  https://wiki.apache.org/solr/SolrPerformanceFactors#indexed_fields To globally disable length norm, one can create a custom similarity and register it as a default similarity though.  On Tuesday, April 8, 2014 2:59 PM,

Re: Strange relevance scoring

2014-04-08 Thread Aman Tandon
yes david you must use the omitNorms=true for great performance Thanks Aman Tandon On Tue, Apr 8, 2014 at 5:36 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi David, omitNorms=true will cause additional performance gains too. https://wiki.apache.org/solr/SolrPerformanceFactors#indexed_fields