You could extend this class and provide your own implementation to
incorporate term frequency into the final score. For the record, you might
want to look into BM25Similarity, which takes term frequency into account,
but in a way that gives a much lower score contribution to hits than
You could use IndexSearcher#explain, which tells you how the score of a
document is computed.
Le mar. 17 juil. 2018 à 19:06, a écrit :
> Hi,-
>
> how can i check the contributions from different fields indexed in the
> hits doc's score?
>
> Best regards
>
>
>
i forgot to put the doc that i was referring to:
https://lucene.apache.org/core/6_0_1/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
Best regards
On 7/17/18 1:01 PM, baris.ka...@oracle.com wrote:
Hi,-
is there a way to diminish the tf(t in d) component to 1? i dont want
Sounds like a job for boosting. Document.setBoost() and/or
Field.setBoost(). The former has gone away in lucene 4.x. See the
migration guide.
Or execute 2 searches, restricting the first to the contact docs or
whichever you want to be top of the list.
--
Ian.
On Tue, Mar 12, 2013 at 7:36
You can sort on multiple values. Keep the primary sort as a relevancy
sort, and choose something else to sort on to keep the rest of the
responses fairly static.
http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/search/So
rt.html
Example:
Sort sortBy = new Sort(new SortField[] {
Dear Ian,
Thanks a lot for your reply. The way you proposed, working correctly and
solved half of my matter.
Once I run the program, system gave me the following output.
output-
**
Searching for 'milk'
Number of hits: 1
0.13287117
0.13287117 = (MATCH)
Dear Grant,
Thanks a lot for your guidence. As you have mentioned, I tried to use
explain() method to get the explanations for relevant scoring. But, once I
call the explain() method, system indicated the following error.
Error-
'The method explain(Query,int) in the type Searcher is not
You are calling the explain method incorrectly. You need something like
System.out.println(indexSearcher.explain(query, 0));
See the javadocs for details.
--
Ian.
On Tue, Jul 6, 2010 at 7:39 AM, manjula wijewickrema
manjul...@gmail.com wrote:
Dear Grant,
Thanks a lot for your guidence.
On Jul 5, 2010, at 5:02 AM, manjula wijewickrema wrote:
Hi,
In my application, I input only single term query (at one time) and get back
the corresponding scorings for those queries. But I am little struggling of
understanding Lucene scoring. I have reffered
: (with basically nonsense words), I'm wondering how others might have
: dealt with this issue.
:
: Another option is to have a custom Similarity class with an altered
: lengthNorm method?
that is what i would recommend ... it's exactly what SweetSpotSimilarity
does (you define a platuea of
Karl Koch wrote:
Are there any other papers that regard the combination of coordination level matching and TFxIDF as advantageous?
We independently developed coordination-level matching combined with
TFxIDF when I worked at Apple. This is documented in:
Karl Koch wrote:
If I do not misunderstand that extract, I would say it suggests the combination of coordination level matching with IDF. I am interested in your view and those who read this?
I understand that sentence:
The natural solution is to correlate a term's matching value with its
-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor
Karl Koch wrote:
If I do not misunderstand that extract, I would say it suggests the
combination of coordination level matching with IDF. I am interested in your
view and those who read this?
I understand that sentence
Soeren Pekrul wrote:
The score for a document is the sum of the term weights w(tf, idf) for
each containing term. So you have already the combination of
coordination level matching with IDF. Now it is possible that your query
requests three terms A, B and C. Two of them (A and B) are quite
FYI: The Wiki has a fair number of resources on IR: http://
wiki.apache.org/jakarta-lucene/InformationRetrieval (I have added a
link to this conversation, which contains a lot of useful information)
Karl, if you are so inclined, please feel free to add any of the
references you have found
Do you know about any papers that discuss this?
Karl
Original-Nachricht
Datum: Wed, 13 Dec 2006 10:31:41 -0500
Von: Yonik Seeley [EMAIL PROTECTED]
An: java-user@lucene.apache.org
Betreff: Re: Lucene scoring: coord_q_d factor
On 12/13/06, Karl Koch [EMAIL PROTECTED] wrote
On Wednesday 13 December 2006 16:42, Karl Koch wrote:
Do you know about any papers that discuss this?
Coordination is called co-ordination In the original idf paper by
K. Spärck Jones, A statistical interpretation of term specificity
and its application in retrieval., Journal of Documentation
On Dec 12, 2006, at 2:23 AM, Karl Koch wrote:
However, what exactly is the advantage of using sqare root instead
of log?
Speaking anecdotally, I wouldn't say there's an advantage. There's a
predictable effect: very long documents are rewarded, since the
damping factor is not as strong.
Karl Koch wrote:
The coord(q,d) normalisation is a score factor based on how many of
the query terms are found in the specified document. and described
here:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
Does this have a theoretical base? On
Karl Koch wrote:
Is there any other paper that actually shows the benefit of doing
this particular normalisation with coord_q_d? I am not suggesting
here that it is not useful, I am just looking for evidence how the
idea developed.
I think it's a mischaracterization to call coordination a
If I understand the question, you do not want to boost in advance a certain
doc, but rather score higher those documents containing the search term
closer to the start of the document.
There is more to define here - for instance, if doc1 has 5 words but doc2
has 1,000,000 words, would you still
: does not pour affinity information into the score - i.e. both doc1 and doc2
: in your example would get the same score, and the SpanFirstQurey would only
: allow you to limit the set of returned documents - Hoss, do you agree with
: this?
Oh ... hmmm ... i think you're right. SpanScorer
Hi,
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Anyone have a doc or something that would allow me to explain
this to execs? A Lucene Scoring for Dummies
idea...explaining math algo to a exec or someone with no
knowledge is not that easy :)
[EMAIL PROTECTED] wrote:
Anyone have a doc or something that would allow me to explain this to execs?
Roughly speaking:
* Documents containing *all* the search terms are good
* Matches on rare words are better than for common words
* Long documents are not as good as short ones
* Documents
: Roughly speaking:
:
: * Documents containing *all* the search terms are good
: * Matches on rare words are better than for common words
: * Long documents are not as good as short ones
: * Documents which mention the search terms many times are good
Be wary of the distinction between term and
On Jun 18, 2005, at 7:39 PM, Paul Libbrecht wrote:
I read the lucene-book about scoring and read a bit of the javadoc
but I can't seem to find somewhere expectations of the bouds for
the score value.
I had believe the score would end up between 0 and 1 but I seem to
keep having values
26 matches
Mail list logo