Re: short documents = help me tweak Similarity??

2007-04-05 Thread Otis Gospodnetic
m/ - Tag - Search - Share - Original Message From: John Kleven <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, April 6, 2007 12:13:30 AM Subject: Re: short documents = help me tweak Similarity?? Thank you kindly for the responses. This was the solution t

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Andrew Hudson
> Also, i don't understand why the encode/decode functions have a range of 7x10^9 to 2x10^-9, when it seems to me the most common values are (boosts set to 1.0) something between 1.0 and 0. When would somebody have a monster huge value like 7x10^9? Even with a huge index time boost of 20.0 or s

Re: short documents = help me tweak Similarity??

2007-04-05 Thread John Kleven
Thank you kindly for the responses. This was the solution that I dreamed up initially as well (overriding lengthNorm) and making the returned values for small numTerms values (e.g. 3 and 4) more discrete. So I did that in multiple ways, and I ran into a different problem. If lengthNorm returns

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Chris Hostetter
: The problem comes when your float value is encoded into that 8 bit : field norm, the 3 length and 4 length both become the same 8 bit : value. Call Similarity.encodeNorm on the values you calculate for the : different numbers of terms and make sure they return different byte : values. bingo.

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Andrew Hudson
L PROTECTED]> To: java-user@lucene.apache.org Sent: Thursday, April 5, 2007 1:45:34 PM Subject: Re: short documents = help me tweak Similarity?? Sorry to re-post -- is this the correct forum for questions like this? I think that writing a new encode/decode operation should help alleviate my problem,

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Otis Gospodnetic
hursday, April 5, 2007 1:45:34 PM Subject: Re: short documents = help me tweak Similarity?? Sorry to re-post -- is this the correct forum for questions like this? I think that writing a new encode/decode operation should help alleviate my problem, but thought that this must be fairly widesprea

Re: short documents = help me tweak Similarity??

2007-04-05 Thread Grant Ingersoll
It is the right forum, silence just means either no one knows the answer or no one who knows the answer has read it... Such is the nature of the community. Have you looked at overriding similarity with your own implementation? Have you done explain() calls on the docs to see where the s

Re: short documents = help me tweak Similarity??

2007-04-05 Thread John Kleven
Sorry to re-post -- is this the correct forum for questions like this? I think that writing a new encode/decode operation should help alleviate my problem, but thought that this must be fairly widespread issue for people using lucene for "non-web-page" searches (i.e., shorter documents) Thanks a

short documents = help me tweak Similarity??

2007-04-02 Thread John Kleven
My documents are cars... i.e., Nissan Altima Sports Package Nissan Altima Standard The problem I have is when i search "Nissan Altima", I want to get the 2nd hit back first, i.e. "Nissan Altima Standard", because it is shorter. However, this doesn't happen. They are both scored the exact same.