Re: lucene 2.9 sorting algorithm

2009-10-14 Thread Yonik Seeley
Interesting idea... though one further piece of info in the mix is that large segments are typically processed first, and tend to fill up the priority queue. Conversion from one segment to another is only done as needed... only the bottom slot is converted automatically when the segment is switche

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Jake Mannix
I had to dig through the source code (actually, walk through a unit test, because that was simpler to see what was going on in the 2.9 sorting), but I think John's way has slightly lower complexity in the balanced segment size case. On Wed, Oct 14, 2009 at 8:57 PM, Yonik Seeley wrote: > Interesti

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
If I remembering it right... this (matching MultiSearcher's approach) was nearly the first thing we tried with LUCENE-1483. But the CPU cost was higher in our tests. I think we had tested unbalanced and balanced segments, but memory is definitely somewhat hazy at this point... I suspect even in

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Yonik Seeley
On Thu, Oct 15, 2009 at 4:31 AM, Jake Mannix wrote: >> Conversion from one segment to another is only >> done as needed... only the bottom slot is converted automatically when >> the segment is switched. > > That's not what it looks like, actually: you convert the bottom slot, and > as soon as you

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Yonik Seeley
On Thu, Oct 15, 2009 at 11:53 AM, Yonik Seeley wrote: > And it seems like a PQ per segment simply delays many of the slow > lookups until the end where the PQs must be merged. Actually, I'm wrong about that part - one can simply merge on values... there will be lots of string comparisons (and a n

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Jake Mannix
On Thu, Oct 15, 2009 at 9:12 AM, Yonik Seeley wrote: > On Thu, Oct 15, 2009 at 11:53 AM, Yonik Seeley > wrote: > > And it seems like a PQ per segment simply delays many of the slow > > lookups until the end where the PQs must be merged. > > Actually, I'm wrong about that part - one can simply mer

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Jake Mannix
On Thu, Oct 15, 2009 at 3:12 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > If I remembering it right... this (matching MultiSearcher's approach) > was nearly the first thing we tried with LUCENE-1483. But the CPU > cost was higher in our tests. I think we had tested unbalanced and

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi guys: I did some Big O math a few times and reached the same conclusion Jake had. I was not sure about the code tuning opportunities we could have done with the MergeAtTheEnd method as Yonik mentioned and the internal behavior with PQ Mike suggested, so I went ahead and implemented the

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
Nice results! Comments below... On Thu, Oct 15, 2009 at 3:58 PM, John Wang wrote: > Hi guys: > >     I did some Big O math a few times and reached the same conclusion Jake > had. > >     I was not sure about the code tuning opportunities we could have done > with the MergeAtTheEnd method as Yoni

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 3:52 PM, Jake Mannix wrote: > > On Thu, Oct 15, 2009 at 3:12 AM, Michael McCandless > wrote: >> >> If I remembering it right... this (matching MultiSearcher's approach) >> was nearly the first thing we tried with LUCENE-1483.  But the CPU >> cost was higher in our tests.  

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Jake Mannix
On Thu, Oct 15, 2009 at 2:12 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > Nice results! Comments below... > > > Here are the numbers (times are measured in nanoseconds): > > > > numHits = 50: > > > > Lucene 2.9/OneComparatorNonScoringCollector: > > num string compares: 251 > > num

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 5:51 PM, Jake Mannix wrote: > > > On Thu, Oct 15, 2009 at 2:12 PM, Michael McCandless > wrote: >> >> Nice results!  Comments below... >> >> > Here are the numbers (times are measured in nanoseconds): >> > >> > numHits = 50: >> > >> > Lucene 2.9/OneComparatorNonScoringColle

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Jake Mannix
On Thu, Oct 15, 2009 at 2:33 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > > I don't think we do any branch tuning on the PQ insertion -- the ifs > involved in re-heapifying the PQ are simply hard for the CPU to > predict (though, apparently, not as hard as comparing strings ;). >

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi Mike: Here are the results for numHits = 10: Lucene 2.9: num string compares: 86 num conversions: 21 num inserts: 115 time: 15069705 cpu: 174294 my test sort: num string compares: 49 num conversions: 0 num inserts: 778 time: 14665375 cpu: 156442 This is how the test data is indexed

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Yonik Seeley
On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless wrote: > Though it'd be odd if the switch to searching by segment > really was most of the gains here. I had assumed that much of the improvement was due to ditching MultiTermEnum/MultiTermDocs. Note that LUCENE-1483 was before LUCENE-1596... bu

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Numbers Mike requested for Int types: only the time/cputime are posted, others are all the same since the algorithm is the same. Lucene 2.9: numhits: 10 time: 14619495 cpu: 146126 numhits: 20 time: 14550568 cpu: 163242 numhits: 100 time: 16467647 cpu: 178379 my test: numHits: 10 time: 1410109

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
BTW, we are have a little sandbox for these experiments. And all my testcode are at. They are not very polished. https://lucene-book.googlecode.com/svn/trunk -John On Thu, Oct 15, 2009 at 3:29 PM, John Wang wrote: > Numbers Mike requested for Int types: > > only the time/cputime are posted, ot

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 5:59 PM, Jake Mannix wrote: >> I don't think we do any branch tuning on the PQ insertion -- the ifs >> involved in re-heapifying the PQ are simply hard for the CPU to >> predict (though, apparently, not as hard as comparing strings ;). > > But it does look like you do some

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
On Thu, Oct 15, 2009 at 6:04 PM, Yonik Seeley wrote: > On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless > wrote: >> Though it'd be odd if the switch to searching by segment >> really was most of the gains here. > > I had assumed that much of the improvement was due to ditching > MultiTermEnum/

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
OK, thanks for running these. It looks like the gains are holding up across smaller queue sizes, and for ints. Though, it's odd that sorting w/ ints is also faster; I'd expect the single PQ to win there. Mike On Thu, Oct 15, 2009 at 6:29 PM, John Wang wrote: > Numbers Mike requested for Int ty

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread Michael McCandless
John, looks like this requires login -- any plans to open that up, or, post the code on an issue? How self-contained is your Multi PQ sorting? EG is it a standalone Collector impl that I can test? Mike On Thu, Oct 15, 2009 at 6:33 PM, John Wang wrote: > BTW, we are have a little sandbox for th

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi Michael: It is open, http://code.google.com/p/lucene-book/source/checkout I think I sent the https url instead, sorry. The multi PQ sorting is fairly self-contained, I have 2 versions, 1 for string and 1 for int, each are Collector impls. I shouldn't say the Multi Q is fast

Re: lucene 2.9 sorting algorithm

2009-10-15 Thread John Wang
Hi Michael: I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as a more general case. I think keeping the old api for ScoreDocComparator and SortComparatorSource would work. Please take a look. Thanks -John On Thu, Oct 15, 2009 at 6:52 PM, John Wang wrote: > Hi Michae

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread Michael McCandless
Thanks John; I'll have a look. Mike On Fri, Oct 16, 2009 at 12:57 AM, John Wang wrote: > Hi Michael: >     I added classes: ScoreDocComparatorQueue and OneSortNoScoreCollector as > a more general case. I think keeping the old api for ScoreDocComparator and > SortComparatorSource would work. >   

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread John Wang
Mike, just a clarification on my first perf report email. The first section, numHits is incorrectly labeled, it should be 20 instead of 50. Sorry about the possible confusion. Thanks -John On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Thanks John; I'l

Re: lucene 2.9 sorting algorithm

2009-10-16 Thread Michael McCandless
Oh, no problem... Mike On Fri, Oct 16, 2009 at 12:33 PM, John Wang wrote: > Mike, just a clarification on my first perf report email. > The first section, numHits is incorrectly labeled, it should be 20 instead > of 50. Sorry about the possible confusion. > Thanks > -John > > On Fri, Oct 16, 200

Re: lucene 2.9 sorting algorithm

2009-10-19 Thread John Wang
Hi Michael: Was wondering if you got a chance to take a look at this. Since deprecated APIs are being removed in 3.0, I was wondering if/when we would decide on keeping the ScoreDocComparator API and thus would be kept for Lucene 3.0. Thanks -John On Fri, Oct 16, 2009 at 9:53 AM, Mich

RE: lucene 2.9 sorting algorithm

2009-10-19 Thread Uwe Schindler
: Re: lucene 2.9 sorting algorithm Hi Michael: Was wondering if you got a chance to take a look at this. Since deprecated APIs are being removed in 3.0, I was wondering if/when we would decide on keeping the ScoreDocComparator API and thus would be kept for Lucene 3.0

Re: lucene 2.9 sorting algorithm

2009-10-19 Thread Jake Mannix
> > - > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: u...@thetaphi.de > -- > > *From:* John Wang [mailto:john.w...@gmail.com] > *Sent:* Tuesday, October 20, 2009 3:28 AM > *To:* java-dev@lucen

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
hi.de _ From: Jake Mannix [mailto:jake.man...@gmail.com] Sent: Tuesday, October 20, 2009 8:37 AM To: java-dev@lucene.apache.org Subject: Re: lucene 2.9 sorting algorithm Given that this new API is pretty unweildy, and seems to not actually perform any better than the old one... are we goi

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
Sorry, I have been digging into it, just didn't get far enough to post patch/results. I'll try to do so today. I did find one bug in OneSortNoScoreCollector, in the getTop() method in the inner compare() method, to break ties it should be: if (v==0 { v = o1.doc + o1.comparatorQueue._base -

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
@gmail.com] Sent: Tuesday, October 20, 2009 8:37 AM To: java-dev@lucene.apache.org Subject: Re: lucene 2.9 sorting algorithm Given that this new API is pretty unweildy, and seems to not actually perform any better than the old one... are we going to consider revisiting that? -jake On Mon, Oct

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 6:51 AM, Mark Miller wrote: > I didn't really follow that thread either - but we didn't move to the new > Comp Api because of it's perfomance vs the old. We did (LUCENE-1483), but those perf tests mixed in a number of other improvements (eg, searching by segment avoids the

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Hmm - perhaps I'm not remembering right. Or perhaps we had different motivations ;) I never did anything in 1483 based on search perf - and I took your tests as testing that we didn't lose perf, not that we gained any. The fact that there were some wins was just a nice surprise from my perspective.

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Ahhh - I see - way at the top. Man that was early. Had forgotten about that stuff even before the issue was finished. Mark Miller wrote: > Hmm - perhaps I'm not remembering right. Or perhaps we had different > motivations ;) I never did anything in 1483 based on search perf - and I > took your tes

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller wrote: > Hmm - perhaps I'm not remembering right. Or perhaps we had different > motivations ;) I never did anything in 1483 based on search perf - and I > took your tests as testing that we didn't lose perf, not that we gained > any. The fact that there

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 8:21 AM, Mark Miller wrote: > Ahhh - I see - way at the top. Man that was early. Had forgotten about > that stuff even before the issue was finished. Tell me about it -- impossible to remember these things :) I wish I could upgrade the RAM in my brain the way I can in my

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
> On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller > wrote: > > Hmm - perhaps I'm not remembering right. Or perhaps we had different > > motivations ;) I never did anything in 1483 based on search perf - and I > > took your tests as testing that we didn't lose perf, not that we gained > > any. The fac

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Uwe Schindler wrote: >> On Tue, Oct 20, 2009 at 8:08 AM, Mark Miller >> wrote: >> >>> Hmm - perhaps I'm not remembering right. Or perhaps we had different >>> motivations ;) I never did anything in 1483 based on search perf - and I >>> took your tests as testing that we didn't lose perf, not

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
Actually though - how are we supposed to get back there? I don't think its as simple as just not removing the deprecated API's. Doesn't even seem close to that simple. Its another nightmare. It would have to be some serious wins to go through that pain starting at a 3.0 release wouldn't it? We just

RE: lucene 2.9 sorting algorithm

2009-10-20 Thread Uwe Schindler
> Actually though - how are we supposed to get back there? I don't think > its as simple as just not removing the deprecated API's. Doesn't even > seem close to that simple. Its another nightmare. It would have to be > some serious wins to go through that pain starting at a 3.0 release > wouldn't i

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 9:31 AM, Uwe Schindler wrote: > It is not bad, only harder to understand (for some people). The Javadoc is much improved since I made the switch. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? Right now, if I go and

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Earwin Burrfoot
There are some advanced things that are plain impossible with stock new API. Like having more than one HitQueue in your Collector, and stashing overflowing values from one of them into another. Once you cross the segment border - BOOM! Otherwise it may look intimidating, but is pretty simple in fa

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Mark Miller
bq. One trivial thing that could be improved is to perhaps move all of the methods to the top of the class? +1 - I think Mike and silently fought on that one once in the patches :) Though I don't know how conscious it was. I prefer the methods at the top myself. Yonik Seeley wrote: > On Tue, Oct

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 10:49 AM, Mark Miller wrote: > bq. One trivial thing that could be improved is to perhaps move all of > the methods to the top of the class? > > +1 - I think Mike and silently fought on that one once in the patches :) > Though I don't know how conscious it was. I prefer the

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi guys: I am not suggesting just simply changing the deprecated signatures. There are some work to be done of course. In the beginning of the thread, we discussed two algorithms (both handling per-segment field loading), and at the conclusion, (to be still verified by Mike) that both algorithm

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Sorry, mistyped again, we have a multivalued field of STRINGS, no integers. -John On Tue, Oct 20, 2009 at 8:55 AM, John Wang wrote: > Hi guys: > I am not suggesting just simply changing the deprecated signatures. > There are some work to be done of course. In the beginning of the thread, we

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
On Tue, Oct 20, 2009 at 11:47 AM, Michael McCandless wrote: > On Tue, Oct 20, 2009 at 10:49 AM, Mark Miller wrote: >> bq. One trivial thing that could be improved is to perhaps move all of >> the methods to the top of the class? >> >> +1 - I think Mike and silently fought on that one once in the

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Michael McCandless
OK I posted a patch that folds the MultiPQ approach into contrib/benchmark, plus a simple python wrapper to run old/new tests across different queries, sort, topN, etc. But I got different results... MultiPQ looks generally slower than SinglePQ. So I think we now need to reconcile what's differen

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread TomS
Hi, I can confirm the below mentioned problems trying to migrate to 2.9. Our Lucene-based (2.4) app uses custom multi-level sorting on a lot of different fields and pretty large indexes (> 100m docs). Most of the fields that we sort on are strings, some with up to 400 characters in length. A lot

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread Earwin Burrfoot
That's quite possible to reimplement, I believe. You can have your docid->ordinal map bound to toplevel reader, as it was before and then your FIeldComparator rebases incoming compare() docids based on what last setNextReader() was called with. On Wed, Oct 21, 2009 at 02:07, TomS wrote: > Hi, > >

Re: lucene 2.9 sorting algorithm

2009-10-20 Thread John Wang
Hi Mike: That's weird. Let me take a look at the patch. Need to brush up on python though :) Thanks -John On Tue, Oct 20, 2009 at 10:25 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > OK I posted a patch that folds the MultiPQ approach into > contrib/benchmark, plus a simple pyth

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread Michael McCandless
OK, thanks. I can help out if you've got questions on the python code... it's rather straightforward: it just iterates over each set of params to test, writes an alg file, runs it, opens the resulting output & parses it for the best run, confirms both single & multi PQ gave precisely the same doc

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread Michael McCandless
On Tue, Oct 20, 2009 at 11:55 AM, John Wang wrote: > the simpler api places less restriction on the type of custom > sorting that can be done. Just to verify: this is not a back-compat break, right? Because, in 2.4, such an interesting custom sort must've been operating at the top-level index r

Re: lucene 2.9 sorting algorithm

2009-10-21 Thread John Wang
Hi Mike: I have been playing with the patch, and I think I have some information that you might like. Let me spend sometime and gather some more numbers and update in jira. Thanks btw: About the conversion on multi values fields, I am not sure I get it (sorry for being ignorant):

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Michael McCandless
On Thu, Oct 22, 2009 at 2:17 AM, John Wang wrote: > I have been playing with the patch, and I think I have some information > that you might like. > Let me spend sometime and gather some more numbers and update in jira. Excellent! > say bottom has ords 23, 45, 76, each correspond

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Hey Michael: Would you mind rerunning the test you have with jdk1.5? Also, if you would, change the comparator method to avoid brachning for int and string comparators, e.g. return index.order[i.doc] - index.order[j.doc]; Thanks -John On Thu, Oct 22, 2009 at 2:38 AM, Mic

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
Why? What might he find? Whats with the cryptic request? Why would Java 1.5 perform better than 1.6? It erases 20 and 40% gains? I know point 2 certainly doesn't. Cards on the table? John Wang wrote: > Hey Michael: > >Would you mind rerunning the test you have with jdk1.5? > >Als

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Mark: Please be patient with me. I am seeing a difference and was wondering if Mike would see the same thing. I thought Michael would be willing to because he expressed interest in understanding what the performance discrepancies are. Again, it is only a request. It is perfectly fine

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Jake Mannix
Mark, We're not seeing exactly the numbers that Mike is seeing in his tests, running with jdk 1.5 on intel macs, so we're trying to eliminate factors of difference. Point 2 does indeed make a difference, we've seen it, and it's only fair: the single pq comparator does this branch optimization

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
I am patient :) And I'm not speaking for Mike, I'm speaking for me. I'm wondering what your seeing. Asking Mike to rerun the tests without giving any further info (you didn't even say that your seeing something different) is unfair to the rest of us ;) Giving 0 info along with your request just ma

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
Thanks - thats all I'm asking for. A simple explanation of why you'd ask for a retest with those two things changed. Just seems its hold your cards a little to close to say - please do this with 0 explanation. As to point 2, thats fine - I'm sure it helps - I was just saying I didn't buy it helps

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Mike: I did just post with what I saw, feel free to read and comment on it. I am simply trying to work with Michael on this and trying to understand the code. As I have expressed previously, I have seen a difference between 1.5 and 1.6 that is significant. Since Mike has post

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
For some reason I guess this didn't go thru and caused all the confusion. ||Seg size||Query||Tot hits||Sort||Top N||QPS old||QPS new||Pct change|| |log||100|rand string|10|91.76|108.63|{color:green}18.4%{color}| |log||100|rand string|25|92.39|106.79|{color:green}15.6%{color}| |log||100

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
>> I guess I should be more clear in the email. No - If you mentioned before the other info and I missed it, just say: Mark you don't know what your talking about it and you missed the info. Thats what I'd do. You just caught me at a time when I'm trying to get these tests going myself, and a l

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Mark: There is no reason for me to withhold information. I just want to understand and share my findings. My bad for not being clear. Mike's test is actually very well written, I just followed instructions in the jira and got it running. I think the tests has good coverage

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
John Wang wrote: > Mark: > >There is no reason for me to withhold information. I just want > to understand and share my findings. Right, I didn't mean to accuse you of that ;) Not that you were doing it on purpose. I was just trying to string out more :) Which I've managed to do - in my usu

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
bq. I just followed instructions in the jira and got it running. Heh - I didn't read down far enough - first comment says 2.9 branch. Thanks ; ) I've been flipping through revisions for a while now, wondering how the heck the revs in the patch match up with trunk. John Wang wrote: > Mark: > >

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Yonik Seeley
On Thu, Oct 22, 2009 at 10:35 PM, John Wang wrote: >        Please be patient with me. I am seeing a difference and was wondering > if Mike would see the same thing. Some differences are bound to be seen... with your changes (JVM changes, branch optimizations), are you seeing better average perfo

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Jake Mannix
It's hard to read the column format, but if you look up above in the thread from tonight, you can see that yes, for PQ sizes less than 100 elements, multiPQ is better, and only starts to be worse at around 100 for strings, and 50 for ints. -jake On Thu, Oct 22, 2009 at 8:06 PM, Yonik Seeley wro

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Jake Mannix
Of course, John's running on his mac laptop, which also may be a factor, which is another reason why he wanted to see if these carried over onto a linux desktop (for example). -jake On Thu, Oct 22, 2009 at 8:11 PM, Jake Mannix wrote: > It's hard to read the column format, but if you look up a

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Hi Yonik I am, but I don't think I should. Even with branching etc., I should see that much of a consistent difference. I am traveling with my macbook pro, I wanted to eliminate all variables. It really does not make sense to me... -John On Thu, Oct 22, 2009 at 8:06 PM, Yonik Seeley wrote

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Yonik Seeley
On Thu, Oct 22, 2009 at 11:11 PM, Jake Mannix wrote: > It's hard to read the column format, but if you look up above in the thread > from tonight, > you can see that yes, for PQ sizes less than 100 elements, multiPQ is > better, and only > starts to be worse at around 100 for strings, and 50 for i

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Jake Mannix
On Thu, Oct 22, 2009 at 8:30 PM, Yonik Seeley wrote: > On Thu, Oct 22, 2009 at 11:11 PM, Jake Mannix > wrote: > > It's hard to read the column format, but if you look up above in the > thread > > from tonight, > > you can see that yes, for PQ sizes less than 100 elements, multiPQ is > > better, a

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
>> he new API is much harder for the >> average user to use, and even for the experienced user, it's not terribly fun, >> and more importantly: Do we have enough info to support that though? All the cases I have seen on the list, people have figured it out pretty easily - havn't really seen any co

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Jake Mannix
On Thu, Oct 22, 2009 at 9:25 PM, Mark Miller wrote: > >> he new API is much harder for the > >> average user to use, and even for the experienced user, it's not > terribly fun, > >> and more importantly: > > Do we have enough info to support that though? All the cases I have seen > on the list, p

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Mark Miller
Jake Mannix wrote: > > > On Thu, Oct 22, 2009 at 9:25 PM, Mark Miller > wrote: > > >> he new API is much harder for the > >> average user to use, and even for the experienced user, it's not > terribly fun, > >> and more importantly: > > Do we have

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread Jake Mannix
On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller wrote: > Yes - I've seen a handful of non core devs report back that they > upgraded with no complaints on the difficulty. Its in the mailing list > archives. The only core dev I've seen say its easy is Uwe. He's super > sharp though, so I wasn't banki

Re: lucene 2.9 sorting algorithm

2009-10-22 Thread John Wang
Hi Yonik: I have been head deep in this trying to find out a good solution for better part of the past two days, it's been hard because there are so many variables: 1) how optimized are the code from either of the implementations 2) VM difference 3) HW etc. Also, there are quite a few dim

RE: lucene 2.9 sorting algorithm

2009-10-22 Thread Uwe Schindler
> Yes - I've seen a handful of non core devs report back that they > upgraded with no complaints on the difficulty. Its in the mailing list > archives. The only core dev I've seen say its easy is Uwe. He's super > sharp though, so I wasn't banking my comment on him ;) I didn't say it's easy -- for

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Earwin Burrfoot
I did. On Fri, Oct 23, 2009 at 09:05, Jake Mannix wrote: > > On Thu, Oct 22, 2009 at 9:58 PM, Mark Miller wrote: >> >> Yes - I've seen a handful of non core devs report back that they >> upgraded with no complaints on the difficulty. Its in the mailing list >> archives. The only core dev I've se

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Michael McCandless
Sheesh I go to bed and so much all of a sudden happens!! Sorry Mark; I should've called out "PATCH IS ON 2.9 BRANCH" more clearly ;) There's no question in my mind that the new comparator API is more complex than the old one, and I really don't like that. I had to rewrite the section of LIA that

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Mark Miller
Yup, I'm not against the testing or the thought - and it is clearly more complicated - I'm not saying its not. But I haven't seen anyone thats come and said they haven't grokked it yet or that they had a hard time with it (though they have run into limitations in what they have tried to do). John a

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Mark Miller
>>I still think we should if performance is no >>better with the new one. Where is there any indication performance is not better with the new one? The benchmarks are clearly against switching back. At best they could argue for two API's - even then it depends - a loss of 10% on Java 1.5 with th

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Mark Miller
Mark Miller wrote: > bq. removing that if from the Multi PQ patch makes sense > > I didn't have a problem with that either - or other code changes - but > jeeze, mention what you are seeing with the switch. I'll tell you what I > saw it - not that much - a bit of improvement, but take a look at the

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Michael McCandless
Agreed: so far I'm seeing serious performance loss with MultiPQ, especially as topN gets larger, and for int sorting. For small queue, String sort, it sometimes wins. So if I were forced to decide now based on the current results, I think we should keep the single PQ API. But: I am right now opt

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread John Wang
Hi Mike: Thank you! It would be really nice to get the optimizations you have done. -John 2009/10/23 Michael McCandless > Agreed: so far I'm seeing serious performance loss with MultiPQ, > especially as topN gets larger, and for int sorting. > > For small queue, String sort, it sometimes wi

Re: lucene 2.9 sorting algorithm

2009-10-23 Thread Michael McCandless
They are included in my last patch on LUCENE-1997. It's somewhat hacked up though :) We'd have to redo it "for real" if we go forward with this... Mike 2009/10/23 John Wang : > Hi Mike: >     Thank you! It would be really nice to get the optimizations you have > done. > -John > > 2009/10/23 Mic