Re: Boost One Term Query

2007-12-12 Thread Jens Grivolla
Erick Erickson wrote: I don't believe you can compare scores across queries in any meaningful way. I actually investigated this to some degree in my thesis, comparing different participating systems from the TREC campaigns. It turns out that some systems' scores (e.g. the top scores for a gi

Re: lucene suggest

2007-08-21 Thread Jens Grivolla
On 8/21/07, Heba Farouk <[EMAIL PROTECTED]> wrote: > the documents are not duplicated, i mean the hits (assume that 2 documents > have the same subject but with different authors, so if i'm searching the > subject, the returned hits will have duplicates ) > i was asking if i can remove duplicates

MoreLikeThis for multiple documents

2007-07-25 Thread Jens Grivolla
Hello, I'm looking to extract significant terms characterizing a set of documents (which in turn relate to a topic). This basically comes down to functionality similar to determining the terms with the greatest offer weight (as used for blind relevance feedback), or maximizing tf.idf (as is

inserting millions of entries

2007-06-28 Thread Jens Grivolla
Hi, I have a Lucene index with a few million entries, and I will need to add batches of a few hundred thousand or a few million additional entries. Unfortunately, I absolutely need to have all indexed entries available when inserting a new one, even within one batch, in order to do some duplicat