Re: Indexing and Disk Writes

Erick Erickson Fri, 04 Nov 2016 09:02:34 -0700

Every time your ramBufferSizeMB limit is exceeded, a segment is
created that's eventually merged. In terms of _throughput_, making
this large usually doesn't help much after about 100M (the default).
It'd be interesting to see if it changes your I/O activity though.

BTW, I'd hard commit (openSearcher=false) much more frequently. As you
see that doesn't particularly change IO, but if Solr should terminate
abnormally the tlog will be replayed on startup and may sit there for
10 minutes.

You could also consider disabling tlogs for the duration of your bulk
indexing, then turn them back on for incremental.

The background merging can be pretty dramatic though, that may well be
where much of this is coming from.

Best,
Erick

On Fri, Nov 4, 2016 at 8:51 AM, Andrew Dinsmore <acdinsm...@gmail.com> wrote:
> We are using Solr 5.4 to index TBs of documents in a bulk fashion to get
> the cluster up and running. Indexing is over HTTP round robin as directed
> by zookeeper.
>
> Each of the 13 nodes is receiving about 6-8 MB/s on the NIC but solr is
> writing around 20 to 25 thousand times per second (4k block size). My
> question is what is Solr doing writing all this data to disk (80-100MB/s)?
>
> Over a three hour run with 4.5 million docs we only committed 20 some times
> but disk activity was pretty constant at the above levels.
>
> Is there more going on than tlogs, commits and merges? When we moved from 1
> minute autoCommit to 10 we committed less per the log messages but I
> expected the bigger initial segments to result in less merging thus lower
> disk activity. But testing showed no significant change in disk writing.
>
> Thanks for any help.
>
> Andrew

Re: Indexing and Disk Writes

Reply via email to