Ya know, It turned out to be embarrassingly simple - I think I just
had a mental block from thinking about how Solr's warming worked for
so long.
Actually, it was so simple, yet I still got in wrong on the first
glance, that it reminded me of this:
http://www.marilynvossavant.com/forum/viewtopic.p
Jason:
I am not sure what "parameters" are you referring to either. Are you
responding to the right email?
Anyhoot, I used everything for the default for both MergePolicies.
LogMergePolicy.setCalibrateSizeByDeletes was a contribution by us from ZMP
for normalize segment size using deleted doc co
On Tue, Sep 22, 2009 at 3:53 PM, Grant Ingersoll wrote:
>> But, the returned reader is read-only, so you can't use it to change
>> norms, do deletes, etc.
>
> Yeah, but an IW can do deletes, and if the this IR is coupled to it
> anyway...
True, but IW's deletes are still buffered, and you can't
Yeah it's all package private, I think it should be protected.
One would use OneMerge.info to then obtain the newly merged SR
via IW.getReader(). There's no reason not to include the newly
merged SR in OneMerge, there wasn't a need when 1516 was written.
On Tue, Sep 22, 2009 at 12:00 PM, Tim Smit
On Sep 22, 2009, at 3:44 PM, Michael McCandless wrote:
On Tue, Sep 22, 2009 at 2:53 PM, Grant Ingersoll
wrote:
One of the pieces I still am missing from all of this is why isn't
IW.getReader() now just the preferred way of getting a IndexReader
for all applications other than those that are
On Tue, Sep 22, 2009 at 2:53 PM, Grant Ingersoll wrote:
> One of the pieces I still am missing from all of this is why isn't
> IW.getReader() now just the preferred way of getting a IndexReader
> for all applications other than those that are completely batch
> oriented?
>
> Why bother with IndexR
Jason Rutherglen wrote:
> For that you can subclass IW.mergeSuccess.
>
>
looks like thats package private :(
also doesn't look like it has the merged output SegmentReader which
could be used for cache loading/cache key (since it may not have been
opened yet, but with NRT it should be available?)
> which one is better
Better for what? What use case are you thinking of?
The merge reasons were covered well in the previous thread.
Another gain is the carry over of deletes in RAM.
I'm getting the feeling the Realtime wiki needs a lot of work.
http://wiki.apache.org/lucene-java/NearRealtimeSe
On Sep 22, 2009, at 2:47 PM, Grant Ingersoll wrote:
And yet, at the first SF Meetup, I recall having a discussion with
Michael B. about this approach versus IR.reopen() that left me
wondering which one is better, since, Lucene has, in fact, always
been about incremental updates (since th
For that you can subclass IW.mergeSuccess.
On Tue, Sep 22, 2009 at 11:43 AM, Tim Smith wrote:
> Jason Rutherglen wrote:
>
> I have a working version of Simple FieldCache Merging LUCENE-1785 that
> should go in real soon.
>
>
>
> Will this contain a callback mechanism i can register with to know w
Slight divergence from the topic...
On Sep 22, 2009, at 10:48 AM, Michael McCandless wrote:
John are you using IndexWriter.setMergedSegmentWarmer, so that a newly
merged segment is warmed before it's "put into production" (returned
by getReader)?
One of the pieces I still am missing from all
Jason Rutherglen wrote:
> I have a working version of Simple FieldCache Merging LUCENE-1785 that
> should go in real soon.
>
>
Will this contain a callback mechanism i can register with to know what
segments are being merged?
that way i can merge my own caches as well at the application layer,
p
I have a working version of Simple FieldCache Merging LUCENE-1785 that
should go in real soon.
On Tue, Sep 22, 2009 at 11:14 AM, Mark Miller wrote:
> 1. see IndexWriter and the method/class that Mike pointed out earlier
> for the warming.
>
> 2. See Lucene-831 - I think we will get some form of t
1. see IndexWriter and the method/class that Mike pointed out earlier
for the warming.
2. See Lucene-831 - I think we will get some form of that in someday.
Tim Smith wrote:
> This sounds pretty interesting
>
> is there a proposed API for doing this warming yet?
> Is there a ticket tracking this?
On Tue, Sep 22, 2009 at 2:01 PM, Tim Smith wrote:
> is there a proposed API for doing this warming yet?
It's already committed and available in 2.9 (see
IndexWriter.setMergedSegmentWarmer).
> for my use cases, it would be really nice for applications to be able to
> associate a custom "IndexCac
This sounds pretty interesting
is there a proposed API for doing this warming yet?
Is there a ticket tracking this?
for my use cases, it would be really nice for applications to be able to
associate a custom "IndexCache" object with an index reader, then this
pluggable "AutoWarmer" would be in ch
John,
I have a few questions in order to better understand, as the
wiki does not reflect the entirety of what you're trying to
describe.
> But it is required to set up several parameters carefully to
get desired behavior.
Which parameters are you referring to?
What were the ZMP parameters used
Well described, that's exactly it! I like the concrete example :)
Thanks Yonik.
Mike
On Tue, Sep 22, 2009 at 1:38 PM, Yonik Seeley
wrote:
> OK Mike, thanks for your patience - I understand now :-)
>
> Here's an example that helped me understand - hopefully it will add to
> others understanding
OK Mike, thanks for your patience - I understand now :-)
Here's an example that helped me understand - hopefully it will add to
others understanding more than it confuses ;-)
IW.getReader() => segments={A, B}
// something causes a merge of A,B into AB to start
addDoc(doc1)
// doc1 goes into s
Actually, I strongly disagree. If you optimize for this case, you are
pessimizing for the real world.
It would be much better to fit a realistic life cycle or just record a trace
of profile updates (no need for content, just an abstract id for each
profile that got updated).
On Mon, Sep 21, 2009
On Tue, Sep 22, 2009 at 11:37 AM, Michael McCandless
wrote:
> The whole point of putting optional warming into IndexWriter was so
> the segment could be warmed *before* the merge commits the change to
> the writer's SegmentInfos.
But... doesn't this add the same amount of latency in a different
p
Right, it allows warming without interrupting obtaining new readers.
I'll update the realtime wiki with this.
Thanks Mike.
On Tue, Sep 22, 2009 at 8:53 AM, Michael McCandless
wrote:
> It's not only that the newly merged segments are quickly searchable
> (you could do that with warming outside of
It's not only that the newly merged segments are quickly searchable
(you could do that with warming outside of IW).
It's more importantly so that you can continue to add/delete docs,
flush the segment, open a new NRT reader, and search those changes,
without waiting for the warming to complete. Y
Adding segment warming to IW is the only way to insure newly
merged segments are quickly searchable without the impact
brought up by John W regarding queries on new segments being
slow when they load field caches.
On Tue, Sep 22, 2009 at 8:37 AM, Michael McCandless
wrote:
> On Tue, Sep 22, 2009 a
On Tue, Sep 22, 2009 at 11:08 AM, Yonik Seeley
wrote:
> I'm still not sure I see the reason for complicating the IndexWriter
> with warming... can't this be done just as efficiently (if not more
> efficiently) in user/application space?
It will be less efficient when you warm outside of IndexWri
On Tue, Sep 22, 2009 at 19:08, Yonik Seeley wrote:
> On Tue, Sep 22, 2009 at 10:48 AM, Michael McCandless
> wrote:
>> John are you using IndexWriter.setMergedSegmentWarmer, so that a newly
>> merged segment is warmed before it's "put into production" (returned
>> by getReader)?
>
> I'm still not
On Tue, Sep 22, 2009 at 10:48 AM, Michael McCandless
wrote:
> John are you using IndexWriter.setMergedSegmentWarmer, so that a newly
> merged segment is warmed before it's "put into production" (returned
> by getReader)?
I'm still not sure I see the reason for complicating the IndexWriter
with wa
John are you using IndexWriter.setMergedSegmentWarmer, so that a newly
merged segment is warmed before it's "put into production" (returned
by getReader)?
Mike
On Mon, Sep 21, 2009 at 9:35 PM, John Wang wrote:
> Jason:
>
> You are missing the point.
>
> The idea is to avoid merging of la
On Sep 21, 2009, at 9:35 PM, John Wang wrote:
Jason:
You are missing the point.
The idea is to avoid merging of large segments. The point of
this MergePolicy is to balance segment merges across the index. The
aim is not to have 1 large segment, it is to have n segments with
bala
Jason:
You are missing the point.
The idea is to avoid merging of large segments. The point of this
MergePolicy is to balance segment merges across the index. The aim is not to
have 1 large segment, it is to have n segments with balanced sizes.
When the large segment is out of the IO
Hi Ted:
In our case it is profile updates. Each profile -> 1 document keyed on
member id.
We do experience people updating their profile and the assumption is
every member is likely to update their profile (that is a bit aggressive I'd
agree, but it is nevertheless a safe upper bound)
I'm not sure I communicated the idea properly. If CMS is set to
1 thread, no matter how intensive the CPU for a merge, it's
limited to 1 core of what is in many cases a 4 or 8 core server.
That leaves the other 3 or 7 cores for queries, which if slow,
indicates that it isn't the merging that's slow
John,
I think that inherent in your test is a uniform distribution of updates.
This seems unrealistic to me, not least because any distribution of updates
caused by a population of objects interacting with each other should be
translation invariant in time which is something a uniform distributio
Jason:
Before jumping into any conclusions, let me describe the test setup. It
is rather different from Lucene benchmark as we are testing high updates in
a realtime environment:
We took a public corpus: medline, indexed to approximately 3 million
docs. And update all the docs over and over
John,
It would be great if Lucene's benchmark were used so everyone
could execute the test in their own environment and verify. It's
not clear the settings or code used to generate the results so
it's difficult to draw any reliable conclusions.
The steep spike shows greater evidence for the IO ca
35 matches
Mail list logo