Re: Lucene 1.9 release date?

2005-10-14 Thread Ray Tsang
Ah I see it. Should we start assigning and voting for issues that should make it into the 1.9 release? Maybe set a due date for submitting these 1.9-bound issues, a week for voting and review, and then allocate time for applying the patches. I can't wait to see an officially released 1.9! Also,

Fwd: Lucene 1.9 release date?

2005-10-14 Thread Ray Tsang
-- Forwarded message -- From: Ray Tsang <[EMAIL PROTECTED]> Date: Oct 15, 2005 11:06 AM Subject: Re: Lucene 1.9 release date? To: java-user@lucene.apache.org Can we add a 1.9 release to the roadmap? or start a 1.9 release tracker issue? ray, On 10/15/05, Erik Hatcher <[EMAIL PRO

Re: Document Duplication for Multiple Segment Merge

2005-10-14 Thread Michael Ji
Sorry, I guess I point out a wrong java class name. I want to be confirmed that if SegmentMerger.java in Lucene do dedup or not. I tracing down couple of java class from SegmentMerger.java, such as, SegmentReader.java, IndexWriter.java, SegmentReader.java. I didn't see a dedup mechanism yet. than

Re: Document Duplication for Multiple Segment Merge

2005-10-14 Thread Yonik Seeley
Sorry, I've only briefly looked at Nutch, so you should ask on that mailing list. Lucene doesn't do deduping. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/14/05, Michael Ji <[EMAIL PROTECTED]> wrote: > > hi Yonik: > > Does that mean when two documents has same MD5 content > in two differe

Re: Document Duplication for Multiple Segment Merge

2005-10-14 Thread Michael Ji
hi Yonik: Does that mean when two documents has same MD5 content in two different segments, IndexMerger.java will keep both of them? When I look at the code of IndexSegment.java, it handle MD5 dedupling by keeping the one with higher document ID. So, when refetching happens, the old segment sho

Re: Document Duplication for Multiple Segment Merge

2005-10-14 Thread Yonik Seeley
There is no concept in Lucene of document identity linked to any fields of a document. You need to handle removal of duplicates yourself. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/14/05, Michael Ji <[EMAIL PROTECTED]> wrote: > > hi, > > When Nutch's IndexMerger.java is called, the inde

Document Duplication for Multiple Segment Merge

2005-10-14 Thread Michael Ji
hi, When Nutch's IndexMerger.java is called, the indexes from multiple segment directories are merged to one target directory. I wonder how lucene deals with the case when identical documents existing in two segments. Is the older document ( lower time stamp ) deleted? thanks, Michael Ji,

Re: next score usage

2005-10-14 Thread Otis Gospodnetic
I think this is for [EMAIL PROTECTED] please remove java-dev@ when replying. --- Michael Ji <[EMAIL PROTECTED]> wrote: > hi, > > I saw several discussions about Distributed Link > Analysis Tool before. And I still have question about > the usage of the field "next score" in Page data > structure

[jira] Created: (LUCENE-455) FieldsReader does not regard offset and position flags

2005-10-14 Thread Frank Steinmann (JIRA)
FieldsReader does not regard offset and position flags -- Key: LUCENE-455 URL: http://issues.apache.org/jira/browse/LUCENE-455 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Reporte

next score usage

2005-10-14 Thread Michael Ji
hi, I saw several discussions about Distributed Link Analysis Tool before. And I still have question about the usage of the field "next score" in Page data structure. Seems Distributed Link Analysis Tool will update this field by OutlinkWithTarget ( as I understand, that means the link has target