Re: question about updates to shard leaders only

Mark Miller Tue, 15 May 2018 09:06:07 -0700

Yeah, basically ConcurrentUpdateSolrClient is a shortcut to getting multi
threaded bulk API updates out of the single threaded, single update API.
The downsides to this are: It is not cloud aware - you have to point it at
a server, you have to add special code to see if there are any errors, you
don't get any fine grained error information back, you still basically have
to break up updates into batches of success/fail units but with fewer
guard rails.


If you want to bulk load it usually makes much more sense to use the bulk
api on CloudSolrServer and treat the whole group of updates as a single
success/fail unit.

- Mark

On Tue, May 15, 2018 at 9:25 AM Erick Erickson <erickerick...@gmail.com>
wrote:

> bq. But don't forget a final client.add(list) after the while-loop ;-)
>
> Ha! But only "if (list.size() > 0)"
>
> And then there was the memorable time I forgot the "list.clear()" when
> I sent the batch and wondered why my indexing progress got slower and
> slower...
>
> Not to mention the time I re-used the same SolrInputDocument that got
> bigger and bigger and bigger.....
>
> Not to mention the other zillion screw-ups I've managed to perpetrate
> in my career.... "Who wrote this stupid code? Oh, wait, it was me.
> DON'T LOOK!!!"...
>
> Astronomy anecdote....
>
> Dale Vrabeck...was at a party with [Rudolph] Minkowski and Dale said
> he’d heard about the astronomer who had exposed a plate all night and
> then put it in the hypo first. Minkowski said, “It was three nights,
> and it was me.”
>
> On Tue, May 15, 2018 at 10:10 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> > On 5/15/2018 12:12 AM, Bernd Fehling wrote:
> >>
> >> OK, I have the CloudSolrClient with SolrJ now running but it seams
> >> a bit slower compared to ConcurrentUpdateSolrClient.
> >> This was not expected.
> >> The logs show that CloudSolrClient send the docs only to the leaders.
> >>
> >> So the only advantage of CloudSolrClient is that it is "Cloud aware"?
> >>
> >> With ConcurrentUpdateSolrClient I get about 1600 docs/sec for loading.
> >> With CloudSolrClient I get only about 1200 docs/sec.
> >
> >
> > ConcurrentUpdateSolrClient internally puts all indexing requests on a
> queue
> > and then can use multiple threads to do parallel indexing in the
> backround.
> > The design of the client has one big disadvantage -- it returns control
> to
> > your program immediately (before indexing actually begins) and always
> > indicates success.  All indexing errors are swallowed.  They are logged,
> but
> > the calling program is never informed that any errors have occurred.
> >
> > Like all other SolrClient implementations, CloudSolrClient is
> thread-safe,
> > but it is not multi-threaded unless YOU create multiple threads that all
> use
> > the same client object.  Full error handling is possible with this
> client.
> > It is also fully cloud aware, adding and removing Solr servers as the
> > SolrCloud changes, without needing to be reconfigured or recreated.
> >
> > Thanks,
> > Shawn
> >
>
-- 
- Mark
about.me/markrmiller

Re: question about updates to shard leaders only

Reply via email to