Thanks Shawn and Erick.

So far I haven't noticed any performance issues before and after the change.

My concern all along is COST. We could have left the configuration as is -
keeping the deleting documents in the index - But we have to scale up our
Solr cluster.  This will double our Solr Cluster Cost. And the additional
COST is what we are trying to avoid.

I will test the expungeDeletes and revert the max segment size back to 5G.

Thanks again,

Moulay

On Mon, Oct 26, 2020 at 5:49 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> "Some large segments were merged into 12GB segments and
> deleted documents were physically removed.”
> and
> “So with the current natural merge strategy, I need to update
> solrconfig.xml
> and increase the maxMergedSegmentMB often"
>
> I strongly recommend you do not continue down this path. You’re making a
> mountain out of a mole-hill. You have offered no proof that removing the
> deleted documents is noticeably improving performance. If you replace
> docs randomly, deleted docs will be removed eventually with the default
> merge policy without you doing _anything_ special at all.
>
> The fact that you think you need to continuously bump up the size of
> your segments indicates your understanding is incomplete. When
> you start changing settings basically at random in order to “fix” a
> problem,
> especially one that you haven’t demonstrated _is_ a problem, you
> invariably make the problem worse.
>
> By making segments larger, you’ve increased the work Solr (well Lucene) has
> to do in order to merge them since the merge process has to handle these
> larger segments. That’ll take longer. There are a fixed number of threads
> that do merging. If they’re all tied up, incoming updates will block until
> a thread frees up. I predict that if you continue down this path,
> eventually
> your updates will start to misbehave and you’ll spend a week trying to
> figure
> out why.
>
> If you insist on worrying about deleted documents, just expungeDeletes
> occasionally. I’d also set the segments size back to the default 5G. I
> can’t
> emphasize strongly enough that the way you’re approaching this will lead
> to problems, not to mention maintenance that is harder than it needs to
> be. If you do set the max segment size back to 5G, your 12G segments will
> _not_ merge until they have lots of deletes, making your problem worse.
> Then you’ll spend time trying to figure out why.
>
> Recovering from what you’ve done already has problems. Those large segments
> _will_ get rewritten (we call it “singleton merge”) when they’ve
> accumulated a
> lot of deletes, but meanwhile you’ll think that your problem is getting
> worse and worse.
>
> When those large segments have more than 10% deleted documents,
> expungeDeletes
> will singleton merge them and they’ll gradually shrink.
>
> So my prescription is:
>
> 1> set the max segment size back to 5G
>
> 2> monitor your segments. When you see your large segments  > 5G have
> more than 10% deleted documents, issue an expungeDeletes command (not
> optimize).
> This will recover your index from the changes you’ve already made.
>
> 3> eventually, all your segments will be under 5G. When that happens, stop
> issuing expungeDeletes.
>
> 4> gather some performance statistics and prove one way or another that as
> deleted
> docs accumulate over time, it impacts performance. NOTE: after your last
> expungeDeletes, deleted docs will accumulate over time until they reach a
> plateau and
> shouldn’t continue increasing after that. If you can _prove_ that
> accumulating deleted
> documents affects performance, institute a regular expungeDeletes.
> Optimize, but
> expungeDeletes is less expensive and on a changing index expungeDeletes is
> sufficient. Optimize is only really useful for a static index, so I’d
> avoid it in your
> situation.
>
> Best,
> Erick
>
> > On Oct 26, 2020, at 1:22 AM, Moulay Hicham <maratusa.t...@gmail.com>
> wrote:
> >
> > Some large segments were merged into 12GB segments and
> > deleted documents were physically removed.
>
>

Reply via email to