"Some large segments were merged into 12GB segments and
deleted documents were physically removed.”
and
“So with the current natural merge strategy, I need to update solrconfig.xml
and increase the maxMergedSegmentMB often"

I strongly recommend you do not continue down this path. You’re making a
mountain out of a mole-hill. You have offered no proof that removing the
deleted documents is noticeably improving performance. If you replace
docs randomly, deleted docs will be removed eventually with the default
merge policy without you doing _anything_ special at all.

The fact that you think you need to continuously bump up the size of
your segments indicates your understanding is incomplete. When
you start changing settings basically at random in order to “fix” a problem,
especially one that you haven’t demonstrated _is_ a problem, you 
invariably make the problem worse.

By making segments larger, you’ve increased the work Solr (well Lucene) has
to do in order to merge them since the merge process has to handle these
larger segments. That’ll take longer. There are a fixed number of threads
that do merging. If they’re all tied up, incoming updates will block until
a thread frees up. I predict that if you continue down this path, eventually
your updates will start to misbehave and you’ll spend a week trying to figure
out why.

If you insist on worrying about deleted documents, just expungeDeletes
occasionally. I’d also set the segments size back to the default 5G. I can’t
emphasize strongly enough that the way you’re approaching this will lead
to problems, not to mention maintenance that is harder than it needs to
be. If you do set the max segment size back to 5G, your 12G segments will
_not_ merge until they have lots of deletes, making your problem worse. 
Then you’ll spend time trying to figure out why.

Recovering from what you’ve done already has problems. Those large segments
_will_ get rewritten (we call it “singleton merge”) when they’ve accumulated a
lot of deletes, but meanwhile you’ll think that your problem is getting worse 
and worse.

When those large segments have more than 10% deleted documents, expungeDeletes
will singleton merge them and they’ll gradually shrink.

So my prescription is:

1> set the max segment size back to 5G

2> monitor your segments. When you see your large segments  > 5G have 
more than 10% deleted documents, issue an expungeDeletes command (not optimize).
This will recover your index from the changes you’ve already made.

3> eventually, all your segments will be under 5G. When that happens, stop
issuing expungeDeletes.

4> gather some performance statistics and prove one way or another that as 
deleted
docs accumulate over time, it impacts performance. NOTE: after your last
expungeDeletes, deleted docs will accumulate over time until they reach a 
plateau and
shouldn’t continue increasing after that. If you can _prove_ that accumulating 
deleted
documents affects performance, institute a regular expungeDeletes. Optimize, but
expungeDeletes is less expensive and on a changing index expungeDeletes is
sufficient. Optimize is only really useful for a static index, so I’d avoid it 
in your
situation.

Best,
Erick

> On Oct 26, 2020, at 1:22 AM, Moulay Hicham <maratusa.t...@gmail.com> wrote:
> 
> Some large segments were merged into 12GB segments and
> deleted documents were physically removed.

Reply via email to