Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Chuck Williams <[EMAIL PROTECTED]> wrote: David Balmain wrote on 10/10/2006 03:56 PM: > Actually not using single doc segments was only possible due to the > fact that I have constant field numbers so both optimizations stem > from this one change. So it I'm not sure if it is worth a

search binning support

2006-10-10 Thread Yu-Hui Jin
Say I have N categories, each item is assigned to one or more categories. And i want the search results being counted against each of the categories. I checked the Lucene in Action book, and there doesn't seem to be this feature. So is there any plan to add binning to Lucene? It looks like this

Re: Ferret's changes

2006-10-10 Thread Chuck Williams
David Balmain wrote on 10/10/2006 03:56 PM: > Actually not using single doc segments was only possible due to the > fact that I have constant field numbers so both optimizations stem > from this one change. So it I'm not sure if it is worth answering your > question but I'll try anyway. It obviousl

Re: Ferret's changes

2006-10-10 Thread Doug Cutting
David Balmain wrote: Although I would like Ferret working well for very large indexes, I don't see it being used to build the next Google. Aim high! And Google's likely a bunch of small indexes anyway. Doug - To unsubscribe,

Re: Ferret's changes

2006-10-10 Thread Marvin Humphrey
On Oct 10, 2006, at 8:03 AM, Yonik Seeley wrote: Looking forward to progress on Lucy. What is done there could potentially be the a future Lucene index format. There has actually been a good bit of progress on Lucy, just nobody can see it. :\ My first priority right now is finishing KinoSe

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Ning Li <[EMAIL PROTECTED]> wrote: On 10/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 10/10/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > Hi, > > > > Maybe I missed it, but I was surprised that nobody here wondered about the algorithm and data structure changes that Dav

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: > The start of my benchmarks are here: > > http://ferret.davebalmain.com/trac/wiki/FerretVsLucene > > I did set maxBufferedDocs to 1000 and optimized both indeces at the > end Ah, I had mis

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Doug Cutting <[EMAIL PROTECTED]> wrote: David Balmain wrote: > The start of my benchmarks are here: > > http://ferret.davebalmain.com/trac/wiki/FerretVsLucene Ferret looks fast! Nice work. A big knee in indexing performance occurs when indexes get much larger than memory, when mer

Re: Ferret's changes

2006-10-10 Thread Yonik Seeley
On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: The start of my benchmarks are here: http://ferret.davebalmain.com/trac/wiki/FerretVsLucene I did set maxBufferedDocs to 1000 and optimized both indeces at the end Ah, I had missed that link last timeIs the current code up-to-date? Th

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2006-10-10 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-675?page=comments#action_12441281 ] Doug Cutting commented on LUCENE-675: - As Marvin points out, quick micro-benchmarks are great to have. But other effects only show up when things get very lar

[jira] Commented: (LUCENE-664) [PATCH] small fixes to the new scoring.html doc

2006-10-10 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-664?page=comments#action_12441280 ] Doron Cohen commented on LUCENE-664: I just noticed that the link to "TermScorer" in "Understanding the Scoring Formula" is broken b/c TermScorer has package v

Re: Ferret's changes

2006-10-10 Thread Doug Cutting
David Balmain wrote: The start of my benchmarks are here: http://ferret.davebalmain.com/trac/wiki/FerretVsLucene Ferret looks fast! Nice work. A big knee in indexing performance occurs when indexes get much larger than memory, when merging requires a lot of disk i/o. In these cases the al

Re: Ferret's changes

2006-10-10 Thread Yonik Seeley
On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: I did set maxBufferedDocs to 1000 and optimized both indeces at the end but I didn't use non-compound format. I think it is better to use compound file format as it is default in both libraries and the penalty will be similar in both cases.

[jira] Commented: (LUCENE-664) [PATCH] small fixes to the new scoring.html doc

2006-10-10 Thread Doron Cohen (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-664?page=comments#action_12441194 ] Doron Cohen commented on LUCENE-664: One comment for Scoring.html: Tthe last sentence in the "Score Boosting" paragraph says: "At scoring (search) time,

Re: Ferret's changes

2006-10-10 Thread Ning Li
On 10/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 10/10/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Hi, > > Maybe I missed it, but I was surprised that nobody here wondered about the algorithm and data structure changes that Dave Balmain made in Ferret, to make it go faster (than Ja

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/11/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: > Given these factors and the fact that benchmarks can be a very touchy > subject, particularly in the Java community, OK, I'll bite! (but I'm always too aggravated at many of the Java des

[jira] Commented: (LUCENE-664) [PATCH] small fixes to the new scoring.html doc

2006-10-10 Thread Grant Ingersoll (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-664?page=comments#action_12441164 ] Grant Ingersoll commented on LUCENE-664: OK, I have committed the changes and updated the site javadocs and scoring. Changes always take 15-30 mins. to pr

Re: Ferret's changes

2006-10-10 Thread Yonik Seeley
On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: Given these factors and the fact that benchmarks can be a very touchy subject, particularly in the Java community, OK, I'll bite! (but I'm always too aggravated at many of the Java design decisions to consider myself part of that community

Re: Ferret's changes

2006-10-10 Thread Yonik Seeley
On 10/10/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi, Maybe I missed it, but I was surprised that nobody here wondered about the algorithm and data structure changes that Dave Balmain made in Ferret, to make it go faster (than Java Lucene). Not using single doc segments for buffered d

Re: Ferret's changes

2006-10-10 Thread Grant Ingersoll
On Oct 10, 2006, at 9:31 AM, David Balmain wrote: When you do get your benchmark stuff in place, I'd be happy to port it to Ruby/Ferret. Do you have anything currently available? Soon. See http://issues.apache.org/jira/browse/LUCENE-675 for the gist of what is coming. I have been asking

Re: Ferret's changes

2006-10-10 Thread Grant Ingersoll
Yep, this, too, is how we support dynamic fields at the Center. The field name is dynamic, but the semantics are fixed. On Oct 10, 2006, at 9:42 AM, Yonik Seeley wrote: On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: On 10/10/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > I would

Re: Ferret's changes

2006-10-10 Thread Yonik Seeley
On 10/10/06, David Balmain <[EMAIL PROTECTED]> wrote: On 10/10/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > I would be interested in another survey, this time about how many > people use a fixed set of Fields in their applications. The large > majority of mine do. I know SOLR supports dynam

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/10/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote: I would be interested in another survey, this time about how many people use a fixed set of Fields in their applications. The large majority of mine do. I know SOLR supports dynamic fields, but I wonder how much they are used. If there tr

Re: Ferret's changes

2006-10-10 Thread Grant Ingersoll
I would be interested in another survey, this time about how many people use a fixed set of Fields in their applications. The large majority of mine do. I know SOLR supports dynamic fields, but I wonder how much they are used. If there truly is a benefit to it, then perhaps we can have a

Re: Ferret's changes

2006-10-10 Thread David Balmain
On 10/10/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Hi, Maybe I missed it, but I was surprised that nobody here wondered about the algorithm and data structure changes that Dave Balmain made in Ferret, to make it go faster (than Java Lucene). I know I've been wondering whether/when Dave

Ferret's changes

2006-10-10 Thread Otis Gospodnetic
Hi, Maybe I missed it, but I was surprised that nobody here wondered about the algorithm and data structure changes that Dave Balmain made in Ferret, to make it go faster (than Java Lucene). I know I've been wondering whether/when Dave will bring those up, and what the chances of those changes