: (d) Now we might get stupid (or erroneous)
: few words docs as top results;
: (e) To solve this, pivoted doc-length-norm punishes too
: long docs (longer than the average) but only slightly
: rewards docs that are shorter than the average.
I get that your calculation is much more gr
Yes, I found what I need is the term vector which is stored in the indexing
time.
I am appreciate you guide me to "Lucene in action", but I think the
interface it offered is version 1.4.
So I need to get the syntax for lucene2.0 for making the term vector add to
the document when the indexing time
[
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated LUCENE-868:
---
Attachment: LUCENE-868-v4.patch
Based on Yonik's and Karl's comments on avoiding loading the
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> I had previously missed the changes to Token that add support for
> using an array (termBuffer):
>
> + // For better indexing speed, use termBuffer (and
> + // termBufferOffset/termBufferLength) instead of termText
> + // to save new'ing a String per
I had previously missed the changes to Token that add support for
using an array (termBuffer):
+ // For better indexing speed, use termBuffer (and
+ // termBufferOffset/termBufferLength) instead of termText
+ // to save new'ing a String per token
+ char[] termBuffer;
+ int termBufferOffset;
[
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513991
]
Grant Ingersoll commented on LUCENE-868:
The TermVectorOffsetInfo and Position arrays are only created if sto
[
https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513983
]
Karl Wettin commented on LUCENE-868:
Sorry for the delay, vacation time.
In short I think this is a really nice
> However ... i still think that if you realy want
> a length norm that takes into account the average
> length of the docs, you want one that rewards docs
> for being near the average ...
... like SweetSpotSimilarity (SSS)
> it doesn't seem to make a lot of sense to me to say
> that a doc whose
[
https://issues.apache.org/jira/browse/LUCENE-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen resolved LUCENE-957.
Resolution: Fixed
Lucene Fields: (was: [New])
committed.
> Lucene RAM Directory doesn't w
I agree. I will add wording to that effect, and also link over to the Wiki
page for details (and update the Wiki page with these details!).
Mike
"Doron Cohen" <[EMAIL PROTECTED]> wrote:
> mikemccand wrote:
> > + /** Expert: change the value of this field. This can be
> > + * used during i
If I understand well your needs:
You ask lucene for a set of words
You wont to sort result by number of different words wich match?
The query is not good, it would be
+content:(aleden bob carray)
I don't understand how can you sort at indexing time with informations
known at querying time.
M.
sa
11 matches
Mail list logo