Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Yonik Seeley
Ya know, It turned out to be embarrassingly simple - I think I just had a mental block from thinking about how Solr's warming worked for so long. Actually, it was so simple, yet I still got in wrong on the first glance, that it reminded me of this: http://www.marilynvossavant.com/forum/viewtopic.p

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread John Wang
Jason: I am not sure what "parameters" are you referring to either. Are you responding to the right email? Anyhoot, I used everything for the default for both MergePolicies. LogMergePolicy.setCalibrateSizeByDeletes was a contribution by us from ZMP for normalize segment size using deleted doc co

Re: IndexWriter.getReader() was Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
On Tue, Sep 22, 2009 at 3:53 PM, Grant Ingersoll wrote: >> But, the returned reader is read-only, so you can't use it to change >> norms, do deletes, etc. > > Yeah, but an IW can do deletes, and if the this IR is coupled to it > anyway... True, but IW's deletes are still buffered, and you can't

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
Yeah it's all package private, I think it should be protected. One would use OneMerge.info to then obtain the newly merged SR via IW.getReader(). There's no reason not to include the newly merged SR in OneMerge, there wasn't a need when 1516 was written. On Tue, Sep 22, 2009 at 12:00 PM, Tim Smit

Re: IndexWriter.getReader() was Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Grant Ingersoll
On Sep 22, 2009, at 3:44 PM, Michael McCandless wrote: On Tue, Sep 22, 2009 at 2:53 PM, Grant Ingersoll wrote: One of the pieces I still am missing from all of this is why isn't IW.getReader() now just the preferred way of getting a IndexReader for all applications other than those that are

Re: IndexWriter.getReader() was Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
On Tue, Sep 22, 2009 at 2:53 PM, Grant Ingersoll wrote: > One of the pieces I still am missing from all of this is why isn't > IW.getReader() now just the preferred way of getting a IndexReader > for all applications other than those that are completely batch > oriented? > > Why bother with IndexR

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Tim Smith
Jason Rutherglen wrote: > For that you can subclass IW.mergeSuccess. > > looks like thats package private :( also doesn't look like it has the merged output SegmentReader which could be used for cache loading/cache key (since it may not have been opened yet, but with NRT it should be available?)

Re: IndexWriter.getReader() was Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
> which one is better Better for what? What use case are you thinking of? The merge reasons were covered well in the previous thread. Another gain is the carry over of deletes in RAM. I'm getting the feeling the Realtime wiki needs a lot of work. http://wiki.apache.org/lucene-java/NearRealtimeSe

Re: IndexWriter.getReader() was Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Grant Ingersoll
On Sep 22, 2009, at 2:47 PM, Grant Ingersoll wrote: And yet, at the first SF Meetup, I recall having a discussion with Michael B. about this approach versus IR.reopen() that left me wondering which one is better, since, Lucene has, in fact, always been about incremental updates (since th

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
For that you can subclass IW.mergeSuccess. On Tue, Sep 22, 2009 at 11:43 AM, Tim Smith wrote: > Jason Rutherglen wrote: > > I have a working version of Simple FieldCache Merging LUCENE-1785 that > should go in real soon. > > > > Will this contain a callback mechanism i can register with to know w

IndexWriter.getReader() was Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Grant Ingersoll
Slight divergence from the topic... On Sep 22, 2009, at 10:48 AM, Michael McCandless wrote: John are you using IndexWriter.setMergedSegmentWarmer, so that a newly merged segment is warmed before it's "put into production" (returned by getReader)? One of the pieces I still am missing from all

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Tim Smith
Jason Rutherglen wrote: > I have a working version of Simple FieldCache Merging LUCENE-1785 that > should go in real soon. > > Will this contain a callback mechanism i can register with to know what segments are being merged? that way i can merge my own caches as well at the application layer, p

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
I have a working version of Simple FieldCache Merging LUCENE-1785 that should go in real soon. On Tue, Sep 22, 2009 at 11:14 AM, Mark Miller wrote: > 1. see IndexWriter and the method/class that Mike pointed out earlier > for the warming. > > 2. See Lucene-831 - I think we will get some form of t

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Mark Miller
1. see IndexWriter and the method/class that Mike pointed out earlier for the warming. 2. See Lucene-831 - I think we will get some form of that in someday. Tim Smith wrote: > This sounds pretty interesting > > is there a proposed API for doing this warming yet? > Is there a ticket tracking this?

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
On Tue, Sep 22, 2009 at 2:01 PM, Tim Smith wrote: > is there a proposed API for doing this warming yet? It's already committed and available in 2.9 (see IndexWriter.setMergedSegmentWarmer). > for my use cases, it would be really nice for applications to be able to > associate a custom "IndexCac

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Tim Smith
This sounds pretty interesting is there a proposed API for doing this warming yet? Is there a ticket tracking this? for my use cases, it would be really nice for applications to be able to associate a custom "IndexCache" object with an index reader, then this pluggable "AutoWarmer" would be in ch

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
John, I have a few questions in order to better understand, as the wiki does not reflect the entirety of what you're trying to describe. > But it is required to set up several parameters carefully to get desired behavior. Which parameters are you referring to? What were the ZMP parameters used

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
Well described, that's exactly it! I like the concrete example :) Thanks Yonik. Mike On Tue, Sep 22, 2009 at 1:38 PM, Yonik Seeley wrote: > OK Mike, thanks for your patience - I understand now :-) > > Here's an example that helped me understand - hopefully it will add to > others understanding

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Yonik Seeley
OK Mike, thanks for your patience - I understand now :-) Here's an example that helped me understand - hopefully it will add to others understanding more than it confuses ;-) IW.getReader() => segments={A, B} // something causes a merge of A,B into AB to start addDoc(doc1) // doc1 goes into s

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Ted Dunning
Actually, I strongly disagree. If you optimize for this case, you are pessimizing for the real world. It would be much better to fit a realistic life cycle or just record a trace of profile updates (no need for content, just an abstract id for each profile that got updated). On Mon, Sep 21, 2009

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Yonik Seeley
On Tue, Sep 22, 2009 at 11:37 AM, Michael McCandless wrote: > The whole point of putting optional warming into IndexWriter was so > the segment could be warmed *before* the merge commits the change to > the writer's SegmentInfos. But... doesn't this add the same amount of latency in a different p

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
Right, it allows warming without interrupting obtaining new readers. I'll update the realtime wiki with this. Thanks Mike. On Tue, Sep 22, 2009 at 8:53 AM, Michael McCandless wrote: > It's not only that the newly merged segments are quickly searchable > (you could do that with warming outside of

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
It's not only that the newly merged segments are quickly searchable (you could do that with warming outside of IW). It's more importantly so that you can continue to add/delete docs, flush the segment, open a new NRT reader, and search those changes, without waiting for the warming to complete. Y

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Jason Rutherglen
Adding segment warming to IW is the only way to insure newly merged segments are quickly searchable without the impact brought up by John W regarding queries on new segments being slow when they load field caches. On Tue, Sep 22, 2009 at 8:37 AM, Michael McCandless wrote: > On Tue, Sep 22, 2009 a

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
On Tue, Sep 22, 2009 at 11:08 AM, Yonik Seeley wrote: > I'm still not sure I see the reason for complicating the IndexWriter > with warming... can't this be done just as efficiently (if not more > efficiently) in user/application space? It will be less efficient when you warm outside of IndexWri

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Earwin Burrfoot
On Tue, Sep 22, 2009 at 19:08, Yonik Seeley wrote: > On Tue, Sep 22, 2009 at 10:48 AM, Michael McCandless > wrote: >> John are you using IndexWriter.setMergedSegmentWarmer, so that a newly >> merged segment is warmed before it's "put into production" (returned >> by getReader)? > > I'm still not

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Yonik Seeley
On Tue, Sep 22, 2009 at 10:48 AM, Michael McCandless wrote: > John are you using IndexWriter.setMergedSegmentWarmer, so that a newly > merged segment is warmed before it's "put into production" (returned > by getReader)? I'm still not sure I see the reason for complicating the IndexWriter with wa

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Michael McCandless
John are you using IndexWriter.setMergedSegmentWarmer, so that a newly merged segment is warmed before it's "put into production" (returned by getReader)? Mike On Mon, Sep 21, 2009 at 9:35 PM, John Wang wrote: > Jason: > >     You are missing the point. > >     The idea is to avoid merging of la

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-22 Thread Grant Ingersoll
On Sep 21, 2009, at 9:35 PM, John Wang wrote: Jason: You are missing the point. The idea is to avoid merging of large segments. The point of this MergePolicy is to balance segment merges across the index. The aim is not to have 1 large segment, it is to have n segments with bala

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
Jason: You are missing the point. The idea is to avoid merging of large segments. The point of this MergePolicy is to balance segment merges across the index. The aim is not to have 1 large segment, it is to have n segments with balanced sizes. When the large segment is out of the IO

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
Hi Ted: In our case it is profile updates. Each profile -> 1 document keyed on member id. We do experience people updating their profile and the assumption is every member is likely to update their profile (that is a bit aggressive I'd agree, but it is nevertheless a safe upper bound)

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread Jason Rutherglen
I'm not sure I communicated the idea properly. If CMS is set to 1 thread, no matter how intensive the CPU for a merge, it's limited to 1 core of what is in many cases a 4 or 8 core server. That leaves the other 3 or 7 cores for queries, which if slow, indicates that it isn't the merging that's slow

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread Ted Dunning
John, I think that inherent in your test is a uniform distribution of updates. This seems unrealistic to me, not least because any distribution of updates caused by a population of objects interacting with each other should be translation invariant in time which is something a uniform distributio

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread John Wang
Jason: Before jumping into any conclusions, let me describe the test setup. It is rather different from Lucene benchmark as we are testing high updates in a realtime environment: We took a public corpus: medline, indexed to approximately 3 million docs. And update all the docs over and over

Re: How to leverage the LogMergePolicy "calibrateSizeByDeletes" patch in Solr ?

2009-09-21 Thread Jason Rutherglen
John, It would be great if Lucene's benchmark were used so everyone could execute the test in their own environment and verify. It's not clear the settings or code used to generate the results so it's difficult to draw any reliable conclusions. The steep spike shows greater evidence for the IO ca