Hi
SolrCloud 4.0: 6 machines, quadcore, 8GB ram, 1T disk, one Solr-node on
each, one collection across the 6 nodes, 4 shards per node
Storing/indexing from 100 threads on external machines, each thread one
doc at the time, full speed (they always have a new doc to store/index)
See attached images
* iowait.png: Measured I/O wait on the Solr machines
* doccount.png: Measured number of doc in Solr collection
Starting from an empty collection. Things are fine wrt storing/indexing
speed for the first two-three hours (100M docs per hour), then speed
goes down dramatically, to an, for us, unacceptable level (max 10M per
hour). At the same time as speed goes down, we see that I/O wait
increases dramatically. I am not 100% sure, but quick investigation has
shown that this is due to almost constant merging.
What to do about this problem?
Know that you can play around with mergeFactor and commit-rate, but
earlier tests shows that this really do not seem to do the job - it
might postpone the time where the problem occurs, but basically it is
just a matter of time before merging exhaust the system.
Is there a way to totally avoid merging, and keep indexing speed at a
high level, while still making sure that searches will perform fairly
well when data-amounts become big? (guess without merging you will end
up with lots and lots of "small" files, and I guess this is not good for
search response-time)
Regards, Per Steffensen