Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/10/06, Grant Ingersoll [EMAIL PROTECTED] wrote: I would be interested in another survey, this time about how many people use a fixed set of Fields in their applications. The large majority of mine do. I know SOLR supports dynamic fields, but I wonder how much they are used. If there

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 10/10/06, David Balmain [EMAIL PROTECTED] wrote: Given these factors and the fact that benchmarks can be a very touchy subject, particularly in the Java community, OK, I'll bite! (but I'm always too aggravated at many of the Java design

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Doug Cutting [EMAIL PROTECTED] wrote: David Balmain wrote: The start of my benchmarks are here: http://ferret.davebalmain.com/trac/wiki/FerretVsLucene Ferret looks fast! Nice work. A big knee in indexing performance occurs when indexes get much larger than memory, when merging

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 10/10/06, David Balmain [EMAIL PROTECTED] wrote: The start of my benchmarks are here: http://ferret.davebalmain.com/trac/wiki/FerretVsLucene I did set maxBufferedDocs to 1000 and optimized both indeces at the end Ah, I had missed

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Ning Li [EMAIL PROTECTED] wrote: On 10/10/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 10/10/06, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, Maybe I missed it, but I was surprised that nobody here wondered about the algorithm and data structure changes that Dave Balmain

Re: undefined primitive types

2006-09-24 Thread David Balmain
Hi Greg, I don't know which documentation of the Lucene FileFormat you are looking at but you can see UInt32 (Int) UInt64 (Long) and VInt defined here: http://lucene.apache.org/java/docs/fileformats.html Are you at liberty to tell us what you are working on? You may also like to take a look

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/10/06, Doug Cutting [EMAIL PROTECTED] wrote: Chuck Williams wrote: Lucene today allows many field properties to vary at the Field level. E.g., the same field name might be tokenized in one Field on a Document while it is untokenized in another Field on the same or different Document.

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Chuck Williams [EMAIL PROTECTED] wrote: David Balmain wrote on 07/10/2006 01:04 AM: The only problem I could find with this solution is that fields are no longer in alphabetical order in the term dictionary but I couldn't think of a use-case where this is necessary although I'm

Re: Global field semantics

2006-07-10 Thread David Balmain
On 7/11/06, Yonik Seeley [EMAIL PROTECTED] wrote: On 7/10/06, David Balmain [EMAIL PROTECTED] wrote: I don't think declaring all fields up front is necessary for substantial optimizations. I've found that the key to some really good optimizations is having constant field numbers

Re: Global field semantics

2006-07-09 Thread David Balmain
On 7/10/06, Chuck Williams [EMAIL PROTECTED] wrote: David Balmain wrote on 07/09/2006 06:44 PM: On 7/10/06, Chuck Williams [EMAIL PROTECTED] wrote: Marvin Humphrey wrote on 07/08/2006 11:13 PM: On Jul 8, 2006, at 9:46 AM, Chuck Williams wrote: Many things would be cleaner in Lucene

Re: bytecount as prefix

2006-05-06 Thread David Balmain
Hi Marvin, Where are you with this? I also have a vested interest in seeing Lucene move to using byte counts. I was wondering if I could help out. Is the patch you pasted here the latest you have? Cheers, Dave On 4/12/06, Marvin Humphrey [EMAIL PROTECTED] wrote: Greets, I'm back working on

Re: NearSpans issue

2006-01-27 Thread David Balmain
Hi Erik, The only way I can see this exception being thrown is when you have two SpanCells with the same start in a particular document. In this case matchIsOrdered will return false even though the SpanCells may still be ordered in the priority queue. The current code for matchIsOrdered is;

Re: Implementation in C Some Questions

2005-11-11 Thread David Balmain
Hi Robert, I'm very interested in this. I've ported the indexing part of Lucene to C myself. Currently it's not portable (runs on *nix), but it does implement file locking. I'm mostly curious to see how you solved some of the problems I came across and how your performance is compared to the java

Re: Faking index merge by modifying segments file?

2005-11-02 Thread David Balmain
This sounds like it should be possible, except for docId clashes - if index A had a document with Id 100 and index B also has a document with Id 100, after my index file copying, index C will end up having 2 documents with Id 100, and that won't work. So, documents in C would have to be