On 1/2/07, sdeck <[EMAIL PROTECTED]> wrote:
Thanks for advanced on any insight on this one.
I have a fairly large query to run, and it takes roughly 20-40 seconds to
complete the way that i have it.
here is the best example I can give.
I have a set of roughly 25K documents indexed
I have que
On 12/11/06, Waheed Mohammed <[EMAIL PROTECTED]> wrote:
Hello,
Is there a way to influence lucene's generation of ids while indexing.
my requirement is. I want to have different indexes where no index should
have
ids that have been assigned to an index earlier.
for instance
IDX1 : {0.1
:
>>
>> 17 okt 2006 kl. 17.54 skrev Find Me:
>>
>>> How to eliminate near duplicates from the index?
>>
>> I would probably try to measure the Ecludian distance between all
>> documents, computed on terms and their positions. Or perhaps use
>> stan
How to eliminate near duplicates from the index? Someone suggested that I
could look at the TermVectors and do a comparision to remove the duplicates.
One major problem with this is the structure of the document is no longer
important. Are there any obvious pitfalls? For example: Document A being
For:
BooleanQuery bQuery=new BooleanQuery();
bQuery.add(messageQuery,true,false)
Use:
BooleanQuery bQuery=new BooleanQuery();
bQuery.add(messageQuery, BooleanClause.Occur.MUST);
Mapping is as follows:
For add(query, true, false) use add(query, BooleanClause.Occur.MUST)
For add(query, false, fal
);
if(hits == null) return;
for(int i = 0; i < hits.length(); i++){
System.out.println("Hit " + i + ": " + hits.score(i) +
"\n" + searcher.explain(disjunctQuery, i).toString());
}
}
Find Me wrote:
public void explai
public void explainSearchScore(String indexLocation, DisjunctionMaxQuery
disjunctQuery){
IndexSearcher searcher = new IndexSearcher(IndexReader.open
(indexLocation));
Hits hits = searcher.search(disjunctQuery);
if(hits == null) return;
for(int i = 0; i < hits.leng
I was trying to print out the score explanation by a DisjunctionMaxQuery.
Though there is a hit score > 0 for the results, there is no detailed
explanation. Am I doing something wrong?
In the following output, each hit has two lines. The first line is the hit
score and the second line is the expl