Hello Shawn, Erick,

I thought about that too, but dismissed it, other similar batched processes 
don't show this problem. Nonetheless i reset cumulativeAdds and watched a batch 
being indexed, it got indexed twice!

Thanks!
Markus
 
-----Original message-----
> From:Erick Erickson <erickerick...@gmail.com>
> Sent: Wednesday 28th November 2018 2:59
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Delete all, index all, end up with 1 segment with 50% deletes
> 
> Shawn's comment seems likely, somehow you're adding all the docs twice
> and only committing at the end. In that case there'd be only 1
> segment. That's about the only way I can imagine your index has
> exactly one segment with exactly half the docs deleted.
> 
> It'd be interesting for you to look at the admin UI>>schema browser
> for your <uniqueKey> field. It'll report the most frequent entries and
> if every <uniqueKey> has exactly 2 entries, then you're indexing the
> same docs twice in one go.
> 
> Plus, the default TieredMergePolicy doesn't necessarily kick in unless
> there are multiple segments of roughly the same size. With an index
> this small it's perfectly possible that TMP is getting triggered and
> saying, in essence, "there's not enough work to do here to bother".
> 
> In Solr 7.5, you can optimize/forceMerge without any danger of
> creating massive segments, see:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/
> (pre Solr 7.5)
> and
> https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/
> (Solr 7.5+).
> 
> Best,
> Erick
> On Tue, Nov 27, 2018 at 4:29 AM Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> >
> > Hello,
> >
> > A background  batch process compiles a data set, when finished, it sends a 
> > delete all to its target collection, then everything gets sent by SolrJ, 
> > followed by a regular commit. When inspecting the core i notice it has one 
> > segment with 9578 documents, of which exactly half are deleted.
> >
> > That Solr node is on 7.5, how can i encourage the merge scheduler to do its 
> > job and merge away all those deletes?
> >
> > Thanks,
> > Markus
> 

Reply via email to