On Feb 1, 2005, at 9:01 AM, Jerry Jalenak wrote:
Is there a way to eliminate duplicate hits being returned from the
index?
Sure, don't put duplicate documents in the index :)
Erik
-
To unsubscribe, e-mail: [EMAIL
Renner Blvd.
Lenexa, KS 66219
(913) 577-1496
[EMAIL PROTECTED]
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 8:35 AM
To: Lucene Users List
Subject: Re: Duplicate Hits
On Feb 1, 2005, at 9:01 AM, Jerry Jalenak wrote:
Is there a way
Jerry Jalenak wrote:
Given Erik's response of 'don't put duplicate documents in the index', how
can I accomplish this in the IndexWriter?
I was dealing with a similar requirement recently. I eventually
decided on storing the MD5 checksum of the document as a keyword. It
means reading it
-1496
[EMAIL PROTECTED]
-Original Message-
From: John Haxby [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 9:06 AM
To: Lucene Users List
Subject: Re: Duplicate Hits
Jerry Jalenak wrote:
Given Erik's response of 'don't put duplicate documents in the index', how
can I
On Feb 1, 2005, at 9:49 AM, Jerry Jalenak wrote:
Given Erik's response of 'don't put duplicate documents in the index',
how
can I accomplish this in the IndexWriter?
As John said - you'll have to come up with some way of knowing whether
you should index or not. For example, when dealing with
Jerry Jalenak wrote:
Nice idea John - one I hadn't considered. Once you have the checksum, do
you 'check' in the index first before storing the second document? Or do
you filter on the query side?
I do a quick search for the md5 checksum before indexing.
Although I suspect not applicable in
Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS 66219
(913) 577-1496
[EMAIL PROTECTED]
-Original Message-
From: John Haxby [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 9:39 AM
To: Lucene Users List
Subject: Re: Duplicate Hits
Jerry Jalenak
PROTECTED]
Sent: Tuesday, February 01, 2005 9:48 AM
To: Lucene Users List
Subject: Re: Duplicate Hits
Jerry Jalenak wrote:
Just to make sure I understand
Do you keep an IndexReader open at the same time you are running the
IndexWriter? From what I can see in the JavaDocs, it looks like only
Jerry Jalenak wrote:
OK - but I'm dealing with indexing between 1.5 and 2 million documents, so I
really don't want to 'batch' them up if I can avoid it. And I also don't
think I can keep an IndexRead open to the index at the same time I have an
IndexWriter open. I may have to try and deal with
On Feb 1, 2005, at 10:51 AM, Jerry Jalenak wrote:
OK - but I'm dealing with indexing between 1.5 and 2 million
documents, so I
really don't want to 'batch' them up if I can avoid it. And I also
don't
think I can keep an IndexRead open to the index at the same time I
have an
IndexWriter open.
Erik Hatcher wrote:
On Feb 1, 2005, at 10:51 AM, Jerry Jalenak wrote:
OK - but I'm dealing with indexing between 1.5 and 2 million
documents, so I
really don't want to 'batch' them up if I can avoid it. And I also
don't
think I can keep an IndexRead open to the index at the same time I
have an
On Jan 24, 2005, at 09:14, Jason Polites wrote:
I am aware of the Filter object however the unique identifier of my
document is a field within the lucene document itself (messageid); and
I am reluctant to access this field using the public API for every Hit
as I fear it will have drastic
there are several hundred or several thousand distinct indexes.
Thanks,
- JP
- Original Message -
From: PA [EMAIL PROTECTED]
To: Lucene Users List lucene-user@jakarta.apache.org
Sent: Monday, January 24, 2005 10:43 PM
Subject: Re: Duplicate hits using ParallelMultiSearcher
On Jan 24, 2005
13 matches
Mail list logo