Re: search quality - assessment & improvements

2007-07-19 Thread Chris Hostetter
: (d) Now we might get stupid (or erroneous) : few words docs as top results; : (e) To solve this, pivoted doc-length-norm punishes too : long docs (longer than the average) but only slightly : rewards docs that are shorter than the average. I get that your calculation is much more gr

Re: Need help for ordering results by specific order

2007-07-19 Thread savageboy
Yes, I found what I need is the term vector which is stored in the indexing time. I am appreciate you guide me to "Lucene in action", but I think the interface it offered is version 1.4. So I need to get the syntax for lucene2.0 for making the term vector add to the document when the indexing time

[jira] Updated: (LUCENE-868) Making Term Vectors more accessible

2007-07-19 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-868: --- Attachment: LUCENE-868-v4.patch Based on Yonik's and Karl's comments on avoiding loading the

Re: Token termBuffer issues

2007-07-19 Thread Michael McCandless
"Yonik Seeley" <[EMAIL PROTECTED]> wrote: > I had previously missed the changes to Token that add support for > using an array (termBuffer): > > + // For better indexing speed, use termBuffer (and > + // termBufferOffset/termBufferLength) instead of termText > + // to save new'ing a String per

Token termBuffer issues

2007-07-19 Thread Yonik Seeley
I had previously missed the changes to Token that add support for using an array (termBuffer): + // For better indexing speed, use termBuffer (and + // termBufferOffset/termBufferLength) instead of termText + // to save new'ing a String per token + char[] termBuffer; + int termBufferOffset;

[jira] Commented: (LUCENE-868) Making Term Vectors more accessible

2007-07-19 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513991 ] Grant Ingersoll commented on LUCENE-868: The TermVectorOffsetInfo and Position arrays are only created if sto

[jira] Commented: (LUCENE-868) Making Term Vectors more accessible

2007-07-19 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513983 ] Karl Wettin commented on LUCENE-868: Sorry for the delay, vacation time. In short I think this is a really nice

Re: search quality - assessment & improvements

2007-07-19 Thread Doron Cohen
> However ... i still think that if you realy want > a length norm that takes into account the average > length of the docs, you want one that rewards docs > for being near the average ... ... like SweetSpotSimilarity (SSS) > it doesn't seem to make a lot of sense to me to say > that a doc whose

[jira] Resolved: (LUCENE-957) Lucene RAM Directory doesn't work for Index Size > 8 GB

2007-07-19 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-957. Resolution: Fixed Lucene Fields: (was: [New]) committed. > Lucene RAM Directory doesn't w

Re: svn commit: r557445 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/document/Field.java src/test/org/apache/lucene/document/TestDocument.java

2007-07-19 Thread Michael McCandless
I agree. I will add wording to that effect, and also link over to the Wiki page for details (and update the Wiki page with these details!). Mike "Doron Cohen" <[EMAIL PROTECTED]> wrote: > mikemccand wrote: > > + /** Expert: change the value of this field. This can be > > + * used during i

Re: Need help for ordering results by specific order

2007-07-19 Thread Mathieu Lecarme
If I understand well your needs: You ask lucene for a set of words You wont to sort result by number of different words wich match? The query is not good, it would be +content:(aleden bob carray) I don't understand how can you sort at indexing time with informations known at querying time. M. sa