Re: Lots of results

2009-12-07 Thread Grant Ingersoll
On Dec 6, 2009, at 6:40 AM, Michael McCandless wrote: > I think a hybrid approach may be worth exploring as well. It'd let > you trade off how much transient RAM you're willing to spend... > > Ie, rather than insisting on heapifying after every insertion, allow > many insertions to arrive and t

Re: Lots of results

2009-12-07 Thread Grant Ingersoll
On Dec 7, 2009, at 1:56 AM, Earwin Burrfoot wrote: > On Sun, Dec 6, 2009 at 02:01, Grant Ingersoll wrote: >> >> On Dec 5, 2009, at 10:47 PM, Earwin Burrfoot wrote: >> >>> If someone needs all results, they know it beforehand. Why can't they >>> write this collector themselves? It's trivial, ju

Re: Lots of results

2009-12-06 Thread Earwin Burrfoot
On Sun, Dec 6, 2009 at 02:01, Grant Ingersoll wrote: > > On Dec 5, 2009, at 10:47 PM, Earwin Burrfoot wrote: > >> If someone needs all results, they know it beforehand. Why can't they >> write this collector themselves? It's trivial, just like you said. > > I'm not following your comment.  Of cour

Re: Lots of results

2009-12-06 Thread Michael McCandless
I think a hybrid approach may be worth exploring as well. It'd let you trade off how much transient RAM you're willing to spend... Ie, rather than insisting on heapifying after every insertion, allow many insertions to arrive and then use a selection/partition algorithm to periodically prune. So

Re: Lots of results

2009-12-05 Thread DM Smith
a Collector > to do it, minus the post processing step, which would be relatively trivial > to add. I'm not sure what constitutes "lots of results". For my application, most searches are for all matching documents. We typically don't exclude stop words in building the

Re: Lots of results

2009-12-05 Thread Paul Elschot
Could one get the best of both worlds by not heapifying the PQ until it is full? Regards, Paul Elschot Op zondag 06 december 2009 00:01:49 schreef Grant Ingersoll: > > On Dec 5, 2009, at 10:47 PM, Earwin Burrfoot wrote: > > > If someone needs all results, they know it beforehand. Why can't they

Re: Lots of results

2009-12-05 Thread Grant Ingersoll
On Dec 5, 2009, at 10:47 PM, Earwin Burrfoot wrote: > If someone needs all results, they know it beforehand. Why can't they > write this collector themselves? It's trivial, just like you said. I'm not following your comment. Of course they can write it. But that's true for all the implementat

Re: Lots of results

2009-12-05 Thread Earwin Burrfoot
If someone needs all results, they know it beforehand. Why can't they write this collector themselves? It's trivial, just like you said. On Sun, Dec 6, 2009 at 01:22, Grant Ingersoll wrote: > At ScaleCamp yesterday in the UK, I was listening to a talk on Xapian and the > speaker said one of the

Lots of results

2009-12-05 Thread Grant Ingersoll
At ScaleCamp yesterday in the UK, I was listening to a talk on Xapian and the speaker said one of the optimizations they do when retrieving a large result set is that instead of managing a Priority Queue, they just allocate a large array to hold all of the results and then sort afterward. Seem