Re: bytecount as prefix

2006-05-07 Thread Marvin Humphrey
Got it. This was the problem, in TermInfosWriter.writeTerm(): -lastTerm = term; +lastBytes = bytes; } Without lastTerm being updated, the auxiliary term dictionary got screwed up. This problem only manifested on large tests because small tests never moved past the first entry, which

Re: bytecount as prefix

2006-05-06 Thread Marvin Humphrey
No progress yet. I think my next move is to do what I did when trying to get KinoSearch to write Lucene-compatible indexes: 1) Generate an optimized split-file format Lucene index from a pathological test corpus. 2) Hack KinoSearch so that it ought to produce an index which is identical

Re: bytecount as prefix

2006-05-06 Thread Marvin Humphrey
On Sat, May 06, 2006 at 05:11:02PM +0900, David Balmain wrote: > Hi Marvin, > > Where are you with this? I also have a vested interest in seeing > Lucene move to using byte counts. I was wondering if I could help out. > Is the patch you pasted here the latest you have? All I've added since then i

Re: bytecount as prefix

2006-05-06 Thread David Balmain
Hi Marvin, Where are you with this? I also have a vested interest in seeing Lucene move to using byte counts. I was wondering if I could help out. Is the patch you pasted here the latest you have? Cheers, Dave On 4/12/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: Greets, I'm back working on

Re: bytecount as prefix

2006-04-12 Thread Doug Cutting
Marvin Humphrey wrote: A phantom blank Term shows up out of nowhere in the middle of the merge process. When you stick a System.err.println into TermInfosWriter's writeTerm... Did you try putting a print statement in SegmentMergeInfo.next(), to see where this blank term comes from? Doug

Re: bytecount as prefix

2006-04-11 Thread Chris Hostetter
org : To: java-dev@lucene.apache.org : Subject: Re: bytecount as prefix : : : On Apr 11, 2006, at 12:05 PM, Marvin Humphrey wrote: : : > TestRangeFilter. : : A phantom blank Term shows up out of nowhere in the middle of the : merge process. : : When you stick a System.err.println into TermInfosW

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 12:05 PM, Marvin Humphrey wrote: TestRangeFilter. A phantom blank Term shows up out of nowhere in the middle of the merge process. When you stick a System.err.println into TermInfosWriter's writeTerm, you ordinarily see it adding Terms in proper sort order: [j

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 2:27 PM, Marvin Humphrey wrote: "all but last", "all but first" and "all but ends" pass! Scratch that, it's totally untrue. I'd forgotten that these compound test cases bail as soon as there's a single failure. "all but last" also fails to return any docs at all. M

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 2:08 PM, Yonik Seeley wrote: On 4/11/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: What do the failing tests have in common? On TestIndexModifier, only a small portion of the deletions fail, and they're all for fairly high values of delId -- sometimes the highest, but not

Re: bytecount as prefix

2006-04-11 Thread Yonik Seeley
On 4/11/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: > What do the failing tests have in common? > > On TestIndexModifier, only a small portion of the deletions fail, and > they're all for fairly high values of delId -- sometimes the highest, > but not always. For RangeFilter and ConstantScoreRa

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 12:18 PM, Doug Cutting wrote: Marvin Humphrey wrote: I'm back working on converting Lucene to using a byte count instead of a char count at as a prefix at the head of each String. Three tests are failing: TestIndexModifier, TestConstantScoreRangeQuery, and TestRang

Re: bytecount as prefix

2006-04-11 Thread Doug Cutting
Marvin Humphrey wrote: I'm back working on converting Lucene to using a byte count instead of a char count at as a prefix at the head of each String. Three tests are failing: TestIndexModifier, TestConstantScoreRangeQuery, and TestRangeFilter. Why those and not others? - private static f