m/ - Tag - Search - Share
- Original Message
From: John Kleven <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, April 6, 2007 12:13:30 AM
Subject: Re: short documents = help me tweak Similarity??
Thank you kindly for the responses.
This was the solution t
> Also, i don't understand why the encode/decode functions have a range of
7x10^9 to 2x10^-9, when it seems to me the most common values are (boosts
set to 1.0) something between 1.0 and 0. When would somebody have a monster
huge value like 7x10^9? Even with a huge index time boost of 20.0 or
s
Thank you kindly for the responses.
This was the solution that I dreamed up initially as well (overriding
lengthNorm) and making the returned values for small numTerms values (e.g. 3
and 4) more discrete.
So I did that in multiple ways, and I ran into a different problem. If
lengthNorm returns
: The problem comes when your float value is encoded into that 8 bit
: field norm, the 3 length and 4 length both become the same 8 bit
: value. Call Similarity.encodeNorm on the values you calculate for the
: different numbers of terms and make sure they return different byte
: values.
bingo.
L PROTECTED]>
To: java-user@lucene.apache.org
Sent: Thursday, April 5, 2007 1:45:34 PM
Subject: Re: short documents = help me tweak Similarity??
Sorry to re-post -- is this the correct forum for questions like this? I
think that writing a new encode/decode operation should help alleviate my
problem,
hursday, April 5, 2007 1:45:34 PM
Subject: Re: short documents = help me tweak Similarity??
Sorry to re-post -- is this the correct forum for questions like this? I
think that writing a new encode/decode operation should help alleviate my
problem, but thought that this must be fairly widesprea
It is the right forum, silence just means either no one knows the
answer or no one who knows the answer has read it... Such is the
nature of the community.
Have you looked at overriding similarity with your own
implementation? Have you done explain() calls on the docs to see
where the s
Sorry to re-post -- is this the correct forum for questions like this? I
think that writing a new encode/decode operation should help alleviate my
problem, but thought that this must be fairly widespread issue for people
using lucene for "non-web-page" searches (i.e., shorter documents)
Thanks a
My documents are cars...
i.e.,
Nissan Altima Sports Package
Nissan Altima Standard
The problem I have is when i search "Nissan Altima", I want to get the 2nd
hit back first, i.e. "Nissan Altima Standard", because it is shorter.
However, this doesn't happen. They are both scored the exact same.