I'm conducting some indexing experiments in SolrCloud and I want to confirm
my conclusions and ask for suggestions on how to improve performance.

My setup includes a single-sharded collection with 1 additional replica in
SolrCloud 5.3.1. I'm using SolrJ and the indexing speed refers to the actual
SolrJ call that adds the document. I've run some indexing tests and it seems
that Lucene indexing is equal to or better than Solr's in all cases. In all
cases the same documents are sent to both Lucene&Solr and the same analysis
is performed on the documents. 

- 2 replicas, leader is a replica on a machine under heavy load => ~3x
slower than Lucene.
- 2 replicas, leader is a replica on a machine under light load => ~2x
slower than Lucene.
- 1 replica on a machine under light load => indexing speed similar to
Lucene.

Conclusions
(*) It seems that the slowest replica determines the indexing speed. 
(*) It gets even worse if the slowest replica is the leader. This is
justified if it's true that only after the leader finishes indexing it
forwards the request to the remaining replicas.

Regarding improvements
(*) I'm indexing pretty big documents 0.5MB<DocSize<1MB so batch updates do
not offer significant performance gain. 
(*) Can I see improvement if I use a multi-sharded collection?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Single-sharded-SolrCloud-vs-Lucene-indexing-speed-tp4242568.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to