Re: Solr indexing performance tips

Marius Grigaitis Thu, 16 Jun 2022 03:15:47 -0700

Hi Vincenzo,

Yes.


On Thu, Jun 16, 2022 at 12:39 PM Vincenzo D'Amore <[email protected]>
wrote:

> Hi Marius, if I have understood correctly you have a deleteByQuery for each
> document, am I right?
>
> On Thu, 16 Jun 2022 at 11:04, Marius Grigaitis
> <[email protected]> wrote:
>
> > Just a followup on the topic.
> >
> > * We checked settings on solr, seem quite default (especially on merge,
> > commit strategies, etc)
> > * We commit every 10 minutes
> > * Added NewRelic to the Solr instance to gather more data and graphs
> >
> > In the end what caught our eye is a few deleteByQuery lines in stacks of
> > running threads while Solr is overloaded. We temporarily removed
> > deleteByQuery and it had around 10x performance improvement on indexing
> > speed.
> >
> > How are we using deleteByQuery?
> >
> > update(add=[{uid: foo-123, sku: 123, ...}, {uid: bar-124, sku: 124} ...],
> > deleteByQuery=["sku: 123 AND uid != foo-123", "sku: 123 AND uid !=
> > bar-124"])
> >
> > UID is the uniqueKey for the index. We do this because "foo" or "bar"
> could
> > change and we no longer want the previous document present.
> >
> > Ideally we should probably change our uniqueKey to be `sku` in this case
> > and we would no longer need deleteByQuery but what could be interesting
> is
> > why deleteByQuery causes such performance bottleneck as well as how we
> > could potentially optimize it if we wanted to keep it?
> >
> > Marius
> >
> > On Wed, Jun 8, 2022 at 8:41 PM David Hastings <
> > [email protected]>
> > wrote:
> >
> > > > * Do NOT commit after each batch of 1000 docs. Instead, commit as
> > seldom
> > > as your requirements allows, e.g. try commitWithin=60000 to commit
> every
> > > minute
> > >
> > > this is the big one.  commit after the entire process is done or on a
> > > timer, if you don't need NRT searching, rarely does anyone ever need
> > that.
> > > the commit is a heavy operation and takes about the same time if you
> are
> > > committing 1000 documents or 100k documents.
> > >
> > > On Wed, Jun 8, 2022 at 10:40 AM Jan Høydahl <[email protected]>
> > wrote:
> > >
> > > > * Go multi threaded for each core as Shawn says. Try e.g. 2, 3 and 4
> > > > threads
> > > > * Experiment with different batch sizes, e.g. try 500 and 2000 -
> > depends
> > > > on your docs what is optimal
> > > > * Do NOT commit after each batch of 1000 docs. Instead, commit as
> > seldom
> > > > as your requirements allows, e.g. try commitWithin=60000 to commit
> > every
> > > > minute
> > > >
> > > > Tip: Try to push Solr metrics to DataDog or some other service, where
> > you
> > > > can see a dashboard with stats on requests/sec, RAM, CPU, threads, GC
> > etc
> > > > which may answer your last question.
> > > >
> > > > Jan
> > > >
> > > > > 8. jun. 2022 kl. 14:06 skrev Shawn Heisey <[email protected]>:
> > > > >
> > > > > On 6/8/2022 3:35 AM, Marius Grigaitis wrote:
> > > > >> * 9 different cores. Each weighs around ~100 MB on disk and has
> > > > >> approximately 90k documents inside each.
> > > > >> * Updating is performed using update method in batches of 1000,
> > > around 9
> > > > >> processes in parallel (split by core)
> > > > >
> > > > > This means that indexing within each Solr core is single-threaded.
> > The
> > > > way to increase indexing speed is to index in parallel with multiple
> > > > threads or processes per index.  If you can increase the CPU power
> > > > available on the Solr server when you increase the number of
> > > > processes/threads sending data to Solr, that might help.
> > > > >
> > > > > Thanks,
> > > > > Shawn
> > > > >
> > > >
> > > >
> > >
> >
> --
> Vincenzo D'Amore
>

Re: Solr indexing performance tips

Reply via email to