Hello Shawn, Erick, I thought about that too, but dismissed it, other similar batched processes don't show this problem. Nonetheless i reset cumulativeAdds and watched a batch being indexed, it got indexed twice!
Thanks! Markus -----Original message----- > From:Erick Erickson <erickerick...@gmail.com> > Sent: Wednesday 28th November 2018 2:59 > To: solr-user <solr-user@lucene.apache.org> > Subject: Re: Delete all, index all, end up with 1 segment with 50% deletes > > Shawn's comment seems likely, somehow you're adding all the docs twice > and only committing at the end. In that case there'd be only 1 > segment. That's about the only way I can imagine your index has > exactly one segment with exactly half the docs deleted. > > It'd be interesting for you to look at the admin UI>>schema browser > for your <uniqueKey> field. It'll report the most frequent entries and > if every <uniqueKey> has exactly 2 entries, then you're indexing the > same docs twice in one go. > > Plus, the default TieredMergePolicy doesn't necessarily kick in unless > there are multiple segments of roughly the same size. With an index > this small it's perfectly possible that TMP is getting triggered and > saying, in essence, "there's not enough work to do here to bother". > > In Solr 7.5, you can optimize/forceMerge without any danger of > creating massive segments, see: > https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ > (pre Solr 7.5) > and > https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/ > (Solr 7.5+). > > Best, > Erick > On Tue, Nov 27, 2018 at 4:29 AM Markus Jelsma > <markus.jel...@openindex.io> wrote: > > > > Hello, > > > > A background batch process compiles a data set, when finished, it sends a > > delete all to its target collection, then everything gets sent by SolrJ, > > followed by a regular commit. When inspecting the core i notice it has one > > segment with 9578 documents, of which exactly half are deleted. > > > > That Solr node is on 7.5, how can i encourage the merge scheduler to do its > > job and merge away all those deletes? > > > > Thanks, > > Markus >