Thats an interesting scaling scheme you mention. I have been trying to devise a good scheme for myself for our scale.
I will try to see how this works out for us. > On Apr 2, 2019, at 9:15 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > Yeah, that would overload it. To get good indexing speed, I configure two > clients per CPU on the indexing machine. With one shard on a 16 processor > machine, that would be 32 threads. With four shards on four 16 processor > machines, 128 clients. Basically, one thread is waiting while the CPU > processes a batch and the other is sending the next batch. > > That should get the cluster to about 80% CPU. If the cluster is handling > queries at the same time, I cut that way back, like one client thread for > every two CPUs. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > >> On Apr 2, 2019, at 8:13 PM, Aroop Ganguly <aroopgang...@icloud.com> wrote: >> >> Mutliple threads to the same index ? And how many concurrent threads? >> >> Our case is not merely multiple threads but actually large scale spark >> indexer jobs that index 1B records at a time with a concurrency of 400. >> In this case multiple such jobs were indexing into the same index. >> >> >>> On Apr 2, 2019, at 7:25 AM, Walter Underwood <wun...@wunderwood.org> wrote: >>> >>> We run multiple threads indexing to Solr all the time and have been doing >>> so for years. >>> >>> How big are your documents and how big are your batches? >>> >>> wunder >>> Walter Underwood >>> wun...@wunderwood.org >>> http://observer.wunderwood.org/ (my blog) >>> >>>> On Apr 1, 2019, at 10:51 PM, Aroop Ganguly <aroopgang...@icloud.com> wrote: >>>> >>>> Turns out the cause was multiple indexing jobs indexing into the index >>>> simultaneously, which one can imagine can cause jvm loads on certain >>>> replicas for sure. >>>> Once this was found and only one job ran at a time, things were back to >>>> normal. >>>> >>>> Your comments seem right on no correlation to the stack trace! >>>> >>>>> On Apr 1, 2019, at 5:32 PM, Shawn Heisey <apa...@elyograg.org> wrote: >>>>> >>>>> 4/1/2019 5:40 PM, Aroop Ganguly wrote: >>>>>> Thanks Shawn, for the initial response. >>>>>> Digging into a bit, I was wondering if we’d care to read the inner most >>>>>> stack. >>>>>> From the inner most stack it seems to be telling us something about what >>>>>> trigger it ? >>>>>> Ofcourse, the system could have been overloaded as well, but is the >>>>>> exception telling us something or its of no use to consider this stack >>>>> >>>>> The stacktrace on OOME is rarely useful. The memory allocation where the >>>>> error is thrown probably has absolutely no connection to the part of the >>>>> program where major amounts of memory are being used. It could be ANY >>>>> memory allocation that actually causes the error. >>>>> >>>>> Thanks, >>>>> Shawn >>>> >>> >> >