I'm conducting some indexing experiments in SolrCloud and I want to confirm my conclusions and ask for suggestions on how to improve performance.
My setup includes a single-sharded collection with 1 additional replica in SolrCloud 5.3.1. I'm using SolrJ and the indexing speed refers to the actual SolrJ call that adds the document. I've run some indexing tests and it seems that Lucene indexing is equal to or better than Solr's in all cases. In all cases the same documents are sent to both Lucene&Solr and the same analysis is performed on the documents. - 2 replicas, leader is a replica on a machine under heavy load => ~3x slower than Lucene. - 2 replicas, leader is a replica on a machine under light load => ~2x slower than Lucene. - 1 replica on a machine under light load => indexing speed similar to Lucene. Conclusions (*) It seems that the slowest replica determines the indexing speed. (*) It gets even worse if the slowest replica is the leader. This is justified if it's true that only after the leader finishes indexing it forwards the request to the remaining replicas. Regarding improvements (*) I'm indexing pretty big documents 0.5MB<DocSize<1MB so batch updates do not offer significant performance gain. (*) Can I see improvement if I use a multi-sharded collection? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Single-sharded-SolrCloud-vs-Lucene-indexing-speed-tp4242568.html Sent from the Solr - User mailing list archive at Nabble.com.