Thanks Shawn and Erick. So far I haven't noticed any performance issues before and after the change.
My concern all along is COST. We could have left the configuration as is - keeping the deleting documents in the index - But we have to scale up our Solr cluster. This will double our Solr Cluster Cost. And the additional COST is what we are trying to avoid. I will test the expungeDeletes and revert the max segment size back to 5G. Thanks again, Moulay On Mon, Oct 26, 2020 at 5:49 AM Erick Erickson <erickerick...@gmail.com> wrote: > "Some large segments were merged into 12GB segments and > deleted documents were physically removed.” > and > “So with the current natural merge strategy, I need to update > solrconfig.xml > and increase the maxMergedSegmentMB often" > > I strongly recommend you do not continue down this path. You’re making a > mountain out of a mole-hill. You have offered no proof that removing the > deleted documents is noticeably improving performance. If you replace > docs randomly, deleted docs will be removed eventually with the default > merge policy without you doing _anything_ special at all. > > The fact that you think you need to continuously bump up the size of > your segments indicates your understanding is incomplete. When > you start changing settings basically at random in order to “fix” a > problem, > especially one that you haven’t demonstrated _is_ a problem, you > invariably make the problem worse. > > By making segments larger, you’ve increased the work Solr (well Lucene) has > to do in order to merge them since the merge process has to handle these > larger segments. That’ll take longer. There are a fixed number of threads > that do merging. If they’re all tied up, incoming updates will block until > a thread frees up. I predict that if you continue down this path, > eventually > your updates will start to misbehave and you’ll spend a week trying to > figure > out why. > > If you insist on worrying about deleted documents, just expungeDeletes > occasionally. I’d also set the segments size back to the default 5G. I > can’t > emphasize strongly enough that the way you’re approaching this will lead > to problems, not to mention maintenance that is harder than it needs to > be. If you do set the max segment size back to 5G, your 12G segments will > _not_ merge until they have lots of deletes, making your problem worse. > Then you’ll spend time trying to figure out why. > > Recovering from what you’ve done already has problems. Those large segments > _will_ get rewritten (we call it “singleton merge”) when they’ve > accumulated a > lot of deletes, but meanwhile you’ll think that your problem is getting > worse and worse. > > When those large segments have more than 10% deleted documents, > expungeDeletes > will singleton merge them and they’ll gradually shrink. > > So my prescription is: > > 1> set the max segment size back to 5G > > 2> monitor your segments. When you see your large segments > 5G have > more than 10% deleted documents, issue an expungeDeletes command (not > optimize). > This will recover your index from the changes you’ve already made. > > 3> eventually, all your segments will be under 5G. When that happens, stop > issuing expungeDeletes. > > 4> gather some performance statistics and prove one way or another that as > deleted > docs accumulate over time, it impacts performance. NOTE: after your last > expungeDeletes, deleted docs will accumulate over time until they reach a > plateau and > shouldn’t continue increasing after that. If you can _prove_ that > accumulating deleted > documents affects performance, institute a regular expungeDeletes. > Optimize, but > expungeDeletes is less expensive and on a changing index expungeDeletes is > sufficient. Optimize is only really useful for a static index, so I’d > avoid it in your > situation. > > Best, > Erick > > > On Oct 26, 2020, at 1:22 AM, Moulay Hicham <maratusa.t...@gmail.com> > wrote: > > > > Some large segments were merged into 12GB segments and > > deleted documents were physically removed. > >