Re: Lucene scoring components

2018-07-17 Thread Adrien Grand
You could extend this class and provide your own implementation to incorporate term frequency into the final score. For the record, you might want to look into BM25Similarity, which takes term frequency into account, but in a way that gives a much lower score contribution to hits than

Re: Lucene scoring overall score

2018-07-17 Thread Adrien Grand
You could use IndexSearcher#explain, which tells you how the score of a document is computed. Le mar. 17 juil. 2018 à 19:06, a écrit : > Hi,- > > how can i check the contributions from different fields indexed in the > hits doc's score? > > Best regards > > >

Re: Lucene scoring components

2018-07-17 Thread baris . kazar
i forgot to put the doc that i was referring to: https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html Best regards On 7/17/18 1:01 PM, baris.ka...@oracle.com wrote: Hi,- is there a way to diminish the tf(t in d) component to 1? i dont want

Re: Lucene scoring

2013-03-12 Thread Ian Lea
Sounds like a job for boosting. Document.setBoost() and/or Field.setBoost(). The former has gone away in lucene 4.x. See the migration guide. Or execute 2 searches, restricting the first to the contact docs or whichever you want to be top of the list. -- Ian. On Tue, Mar 12, 2013 at 7:36

RE: Lucene scoring and random result order

2011-08-25 Thread Sendros, Jason
You can sort on multiple values. Keep the primary sort as a relevancy sort, and choose something else to sort on to keep the rest of the responses fairly static. http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/search/So rt.html Example: Sort sortBy = new Sort(new SortField[] {

Re: Lucene Scoring

2010-07-07 Thread manjula wijewickrema
Dear Ian, Thanks a lot for your reply. The way you proposed, working correctly and solved half of my matter. Once I run the program, system gave me the following output. output- ** Searching for 'milk' Number of hits: 1 0.13287117 0.13287117 = (MATCH)

Re: Lucene Scoring

2010-07-06 Thread manjula wijewickrema
Dear Grant, Thanks a lot for your guidence. As you have mentioned, I tried to use explain() method to get the explanations for relevant scoring. But, once I call the explain() method, system indicated the following error. Error- 'The method explain(Query,int) in the type Searcher is not

Re: Lucene Scoring

2010-07-06 Thread Ian Lea
You are calling the explain method incorrectly. You need something like System.out.println(indexSearcher.explain(query, 0)); See the javadocs for details. -- Ian. On Tue, Jul 6, 2010 at 7:39 AM, manjula wijewickrema manjul...@gmail.com wrote: Dear Grant, Thanks a lot for your guidence.

Re: Lucene Scoring

2010-07-05 Thread Grant Ingersoll
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote: Hi, In my application, I input only single term query (at one time) and get back the corresponding scorings for those queries. But I am little struggling of understanding Lucene scoring. I have reffered

Re: Lucene scoring and short fields

2008-02-07 Thread Chris Hostetter
: (with basically nonsense words), I'm wondering how others might have : dealt with this issue. : : Another option is to have a custom Similarity class with an altered : lengthNorm method? that is what i would recommend ... it's exactly what SweetSpotSimilarity does (you define a platuea of

Re: Lucene scoring: coord_q_d factor

2006-12-19 Thread Doug Cutting
Karl Koch wrote: Are there any other papers that regard the combination of coordination level matching and TFxIDF as advantageous? We independently developed coordination-level matching combined with TFxIDF when I worked at Apple. This is documented in:

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Soeren Pekrul
Karl Koch wrote: If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this? I understand that sentence: The natural solution is to correlate a term's matching value with its

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Karl Koch
-user@lucene.apache.org Betreff: Re: Lucene scoring: coord_q_d factor Karl Koch wrote: If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this? I understand that sentence

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Soeren Pekrul
Soeren Pekrul wrote: The score for a document is the sum of the term weights w(tf, idf) for each containing term. So you have already the combination of coordination level matching with IDF. Now it is possible that your query requests three terms A, B and C. Two of them (A and B) are quite

Re: Lucene scoring: coord_q_d factor

2006-12-14 Thread Grant Ingersoll
FYI: The Wiki has a fair number of resources on IR: http:// wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a link to this conversation, which contains a lot of useful information) Karl, if you are so inclined, please feel free to add any of the references you have found

Re: Lucene scoring: coord_q_d factor

2006-12-13 Thread Karl Koch
Do you know about any papers that discuss this? Karl Original-Nachricht Datum: Wed, 13 Dec 2006 10:31:41 -0500 Von: Yonik Seeley [EMAIL PROTECTED] An: java-user@lucene.apache.org Betreff: Re: Lucene scoring: coord_q_d factor On 12/13/06, Karl Koch [EMAIL PROTECTED] wrote

Re: Lucene scoring: coord_q_d factor

2006-12-13 Thread Paul Elschot
On Wednesday 13 December 2006 16:42, Karl Koch wrote: Do you know about any papers that discuss this? Coordination is called co-ordination In the original idf paper by K. Spärck Jones, A statistical interpretation of term specificity and its application in retrieval., Journal of Documentation

Re: Lucene scoring: Term frequency normalisation

2006-12-12 Thread Marvin Humphrey
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote: However, what exactly is the advantage of using sqare root instead of log? Speaking anecdotally, I wouldn't say there's an advantage. There's a predictable effect: very long documents are rewarded, since the damping factor is not as strong.

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: The coord(q,d) normalisation is a score factor based on how many of the query terms are found in the specified document. and described here: http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord Does this have a theoretical base? On

Re: Lucene scoring: coord_q_d factor

2006-12-12 Thread Steven Rowe
Karl Koch wrote: Is there any other paper that actually shows the benefit of doing this particular normalisation with coord_q_d? I am not suggesting here that it is not useful, I am just looking for evidence how the idea developed. I think it's a mischaracterization to call coordination a

Re: Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread Doron Cohen
If I understand the question, you do not want to boost in advance a certain doc, but rather score higher those documents containing the search term closer to the start of the document. There is more to define here - for instance, if doc1 has 5 words but doc2 has 1,000,000 words, would you still

Re: Lucene scoring question (how to boost leading terms match)

2006-10-03 Thread Chris Hostetter
: does not pour affinity information into the score - i.e. both doc1 and doc2 : in your example would get the same score, and the SpanFirstQurey would only : allow you to limit the set of returned documents - Hoss, do you agree with : this? Oh ... hmmm ... i think you're right. SpanScorer

RE: Lucene Scoring

2006-03-08 Thread Pasha Bizhan
Hi, From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Anyone have a doc or something that would allow me to explain this to execs? A Lucene Scoring for Dummies idea...explaining math algo to a exec or someone with no knowledge is not that easy :)

Re: Lucene Scoring

2006-03-08 Thread markharw00d
[EMAIL PROTECTED] wrote: Anyone have a doc or something that would allow me to explain this to execs? Roughly speaking: * Documents containing *all* the search terms are good * Matches on rare words are better than for common words * Long documents are not as good as short ones * Documents

Re: Lucene Scoring

2006-03-08 Thread Chris Hostetter
: Roughly speaking: : : * Documents containing *all* the search terms are good : * Matches on rare words are better than for common words : * Long documents are not as good as short ones : * Documents which mention the search terms many times are good Be wary of the distinction between term and

Re: Lucene scoring bounds ??

2005-06-20 Thread Erik Hatcher
On Jun 18, 2005, at 7:39 PM, Paul Libbrecht wrote: I read the lucene-book about scoring and read a bit of the javadoc but I can't seem to find somewhere expectations of the bouds for the score value. I had believe the score would end up between 0 and 1 but I seem to keep having values