Thanks Erick!

Rahul

On Mon, Dec 21, 2015 at 10:07 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Rahul:
>
> bq:  we dont want the index sizes to grow too large and auto optimzie to
> kick in
>
> Not what quite what's going on. There is no "auto optimize". What
> there is is background merging that will take _some_ segments and
> merge them together. Very occasionally this will be the same as a full
> optimize if it just happens that "some" means all the segments.
>
> bq: recovery takes a bit more time when it is not optimized
>
> I'd be interested in formal measurements here. A recovery that copied
> the _entire_ index down from the leader shouldn't really have that
> much be different between an optimized and non-optimized index, but
> all things are possible. If the recovery is a "peer sync" it shouldn't
> matter at all.
>
> If you're continually adding documents that _replace_ older documents,
> optimizing will recover any "holes" left by the old updated docs. An
> update is really a mark-as-deleted for the old version and a re-index
> of the new. Since segments are write-once, the old data is left there
> until the segment is merged. Now, one of the bits of information that
> goes into deciding whether to merge a segment or not is the size.
> Another is the percentage of deleted docs. When you optimize, you get
> one huge segment. Now you have to update a lot of docs for that
> segment to have a large percentage of deleted documents and be merged,
> thus wasting space and memory.
>
> So it's a tradeoff. But if you're getting satisfactory performance
> from what you have now, there's no reason to change.
>
> Here's a wonderful video about the process. you want the third one
> down (TieredMergePolicy) as that's the default.
>
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Best,
> Erick
>
> On Sun, Dec 20, 2015 at 8:26 PM, Rahul Ramesh <rr.ii...@gmail.com> wrote:
> > Hi Erick,
> > We index around several million documents/ day and we optimize everyday
> > when the relative load is low. The reason we optimize is, we dont want
> the
> > index sizes to grow too large and auto optimzie to kick in. When auto
> > optimize kicks in, it results in unpredictable performance as it is CPU
> and
> > IO intensive.
> >
> > In older solr (4.2), when the segment size grows too large, insertion
> used
> > to fail .  Have we seen this problem in solr cloud?
> >
> > Also, we have observed, recovery takes a bit more time when it is not
> > optimized. We dont have any quantitative measurement for the same. Its
> just
> > an observation. Is this correct observation?
> >
> > If we optimize it every day, the indexes will not be skewed right?
> >
> > Please let me know if my understanding is correct.
> >
> > Regards,
> > Rahul
> >
> > On Mon, Dec 21, 2015 at 9:54 AM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> You'll probably have to shard before you get to the TB range. At that
> >> point, all the optimization is done individually on each shard so it
> >> really doesn't matter how many shards you have.
> >>
> >> Just issuing
> >> http://solr:port/solr/collection/update?optimize=true
> >>
> >> is sufficient, that'll forward the optimize command to all the shards
> >> in the collection.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, Dec 20, 2015 at 8:19 PM, Zheng Lin Edwin Yeo
> >> <edwinye...@gmail.com> wrote:
> >> > Thanks for your information Erick.
> >> >
> >> > We have yet to decide how often we will update the index to include
> new
> >> > documents that came in. Let's say we update the index once a day, then
> >> when
> >> > the indexed is updated, we do the optimization (this will be done at
> >> night
> >> > when there are not many users using the system).
> >> > But my index size will probably grow quite big (potentially can go up
> to
> >> > more than 1TB in the future), so does that have to be taken into
> >> > consideration too?
> >> >
> >> > Regards,
> >> > Edwin
> >> >
> >> >
> >> > On 21 December 2015 at 12:12, Erick Erickson <erickerick...@gmail.com
> >
> >> > wrote:
> >> >
> >> >> Much depends on how often the index is updated. If your index only
> >> >> changes, say, once a day then it's probably a good idea. If you're
> >> >> constantly updating your index, then I'd recommend that you do _not_
> >> >> optimize.
> >> >>
> >> >> Optimizing will create one large segment. That segment will be
> >> >> unlikely to be merged since it is so large relative to other segments
> >> >> for quite a while, resulting in significant wasted space. So if
> you're
> >> >> regularly indexing documents that _replace_ existing documents, this
> >> >> will skew your index.
> >> >>
> >> >> Bottom line:
> >> >> If you have a relatively static index the you can build and then use
> >> >> for an extended time (as in 12 hours plus) it can be worth the time
> to
> >> >> optimize. Otherwise I wouldn't bother.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Sun, Dec 20, 2015 at 7:57 PM, Zheng Lin Edwin Yeo
> >> >> <edwinye...@gmail.com> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I would like to find out, will it be good to do write a script to
> do
> >> an
> >> >> > auto-opitmization of the indexes at a certain time every day? Is
> there
> >> >> any
> >> >> > advantage to do so?
> >> >> >
> >> >> > I found that optimization can reduce the index size by quite a
> >> >> > signification amount, and allow the searching of the index to run
> >> faster.
> >> >> > But will there be advantage if we do the optimization every day?
> >> >> >
> >> >> > I'm using Solr 5.3.0
> >> >> >
> >> >> > Regards,
> >> >> > Edwin
> >> >>
> >>
>

Reply via email to