Re: Solr indexing performance tips

Marius Grigaitis Thu, 16 Jun 2022 12:21:35 -0700

I think there are or were technical reasons behind it and thats something
to figure out. Its also more complicated than that, I just simplified it.
E.g. uniqueKey is actually a composition of two ids and relationship
between them is important for grouping purposes.


I agree with you on switching to sku might make sense though.

On Thu, Jun 16, 2022, 20:07 Vincenzo D'Amore <[email protected]> wrote:

> May I ask why you haven't used the sku as (primary key)? Do you need to
> have more versions of the same sku?
> For my understanding, if you can have the sku as primary key, almost all
> deleteByQuery are useless.
>
> On Thu, Jun 16, 2022 at 4:38 PM Shawn Heisey <[email protected]> wrote:
>
> > On 6/16/22 02:59, Marius Grigaitis wrote:
> > > In the end what caught our eye is a few deleteByQuery lines in stacks
> of
> > > running threads while Solr is overloaded. We temporarily removed
> > > deleteByQuery and it had around 10x performance improvement on indexing
> > > speed.
> >
> > I do not understand all the low-level interactions.  But I have seen
> > deleteByQuery cause some major problems.  It seems to create a blocking
> > situation where Lucene waits for things to complete before it actually
> > does the delete, and anything sent AFTER the delete waits for the
> > delete.  Imagine this situation:
> >
> > 1) Ongoing indexing begins a segment merge, one that will take 15
> > minutes to complete.
> > 2) A deleteByQuery is sent.
> > 3) More index changes are sent.
> >
> > What happens in this situation is that step 2 will wait for the merge to
> > complete, and step 3 will wait for step 2 to complete.  I have seen
> > automatic segment merges that take a lot longer than 15 minutes.
> >
> > If step 2 is changed to query for ID and then use deleteById, then steps
> > 2 and 3 will run concurrently with the merge.
> >
> > It took a lot of headscratching to figure out why my indexing process
> > sometimes stalled for LONG time spans.
> >
> > Thanks,
> > Shawn
> >
> >
>
> --
> Vincenzo D'Amore
>

Re: Solr indexing performance tips

Reply via email to