Re: Document Update performances Improvement

2019-10-23 Thread Jörn Franke
Well coalesce does require shuffle and network, however in most cases it is less than repartition as it moves the data (through the network) to already existing executors. However as you see and others confirm: for high peformance you don’t need high parallelism on the ingestion side, but you ca

Re: Document Update performances Improvement

2019-10-23 Thread Erick Erickson
My first question is always “what’s the bottleneck”? Unless you’re driving your CPUs and/or I/O hard on Solr, the bottleneck is in the acquisition of the docs not on the Solr side. Also, be sure and batch in groups of at least 10x the number of shards, see: https://lucidworks.com/post/really-ba

Re: Document Update performances Improvement

2019-10-23 Thread Nicolas Paris
> With Spark-Solr additional complexity comes. You could have too many > executors for your Solr instance(s), ie a too high parallelism. I have been reducing the parallelism of spark-solr part by 5. I had 40 executors loading 4 shards. Right now only 8 executors loading 4 shards. As a result, I ca

Re: Document Update performances Improvement

2019-10-23 Thread Nicolas Paris
> Set the first two to the same number, and the third to a minumum of three > times what you set the other two. > When I built a Solr setup, I increased maxMergeAtOnce and segmentsPerTier to > 35, and maxMergeAtOnceExplicit to 105. This made merging happen a lot less > frequently. Good to know th

Re: Document Update performances Improvement

2019-10-23 Thread Nicolas Paris
> . > Thanks for those relevant pointers and the explanation. > How often do you commit? Are yo

Re: Document Update performances Improvement

2019-10-23 Thread Shawn Heisey
On 10/22/2019 1:12 PM, Nicolas Paris wrote: We, at Auto-Suggest, also do atomic updates daily and specifically changing merge factor gave us a boost of ~4x Interesting. What kind of change exactly on the merge factor side ? The mergeFactor setting is deprecated. Instead, use maxMergeAtOnce,

Re: Document Update performances Improvement

2019-10-22 Thread Paras Lehana
Hi Nicolas, What kind of change exactly on the merge factor side ? We increased maxMergeAtOnce and segmentsPerTier from 5 to 50. This will make Solr to merge segments less frequently after many index updates. Yes, you need to find the sweet spot here but do try increasing these values from the d

Re: Document Update performances Improvement

2019-10-22 Thread Nicolas Paris
> We, at Auto-Suggest, also do atomic updates daily and specifically > changing merge factor gave us a boost of ~4x Interesting. What kind of change exactly on the merge factor side ? > At current configuration, our core atomically updates ~423 documents > per second. Would you say atomical upd

Re: Document Update performances Improvement

2019-10-22 Thread Paras Lehana
Hi Nicolas, Have you tried playing with values of *IndexConfig* (merge factor, segment size, maxBufferedDocs, Merge Policies)? We, at Auto-Suggest, also do atomic updates daily and specifically changing merge factor gave us

Re: Document Update performances Improvement

2019-10-19 Thread Nicolas Paris
> Maybe you need to give more details. I recommend always to try and > test yourself as you know your own solution best. What performance do > your use car needs and what is your current performance? I have 10 collections on 4 shards (no replications). The collections are quite large ranging from

Re: Document Update performances Improvement

2019-10-19 Thread Jörn Franke
Maybe you need to give more details. I recommend always to try and test yourself as you know your own solution best. Depending on your spark process atomic updates could be faster. With Spark-Solr additional complexity comes. You could have too many executors for your Solr instance(s), ie a to

Re: Document Update performances Improvement

2019-10-19 Thread Nicolas Paris
Hi community, Any advice to speed-up updates ? Is there any advice on commit, memory, docvalues, stored or any tips to faster things ? Thanks On Wed, Oct 16, 2019 at 12:47:47AM +0200, Nicolas Paris wrote: > Hi > > I am looking for a way to faster the update of documents. > > In my context, th

Document Update performances Improvement

2019-10-15 Thread Nicolas Paris
Hi I am looking for a way to faster the update of documents. In my context, the update replaces one of the many existing indexed fields, and keep the others as is. Right now, I am building the whole document, and replacing the existing one by id. I am wondering if **atomic update feature** woul