I think there are or were technical reasons behind it and thats something to figure out. Its also more complicated than that, I just simplified it. E.g. uniqueKey is actually a composition of two ids and relationship between them is important for grouping purposes.
I agree with you on switching to sku might make sense though. On Thu, Jun 16, 2022, 20:07 Vincenzo D'Amore <[email protected]> wrote: > May I ask why you haven't used the sku as (primary key)? Do you need to > have more versions of the same sku? > For my understanding, if you can have the sku as primary key, almost all > deleteByQuery are useless. > > On Thu, Jun 16, 2022 at 4:38 PM Shawn Heisey <[email protected]> wrote: > > > On 6/16/22 02:59, Marius Grigaitis wrote: > > > In the end what caught our eye is a few deleteByQuery lines in stacks > of > > > running threads while Solr is overloaded. We temporarily removed > > > deleteByQuery and it had around 10x performance improvement on indexing > > > speed. > > > > I do not understand all the low-level interactions. But I have seen > > deleteByQuery cause some major problems. It seems to create a blocking > > situation where Lucene waits for things to complete before it actually > > does the delete, and anything sent AFTER the delete waits for the > > delete. Imagine this situation: > > > > 1) Ongoing indexing begins a segment merge, one that will take 15 > > minutes to complete. > > 2) A deleteByQuery is sent. > > 3) More index changes are sent. > > > > What happens in this situation is that step 2 will wait for the merge to > > complete, and step 3 will wait for step 2 to complete. I have seen > > automatic segment merges that take a lot longer than 15 minutes. > > > > If step 2 is changed to query for ID and then use deleteById, then steps > > 2 and 3 will run concurrently with the merge. > > > > It took a lot of headscratching to figure out why my indexing process > > sometimes stalled for LONG time spans. > > > > Thanks, > > Shawn > > > > > > -- > Vincenzo D'Amore >
