Your indexing client, if written in SolrJ, should use CloudSolrServer which is, in Matt's terms "leader aware". It divides up the documents to be indexed into packets that where each doc in the packet belongs on the same shard, and then sends the packet to the shard leader. This avoids a lot of re-routing and should scale essentially linearly. You may have to add more clients though, depending upon who hard the document-generator is working.
Also, make sure that you send batches of documents as Shawn suggests, I use 1,000 as a starting point. Best, Erick On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/30/2014 2:56 PM, Ian Rose wrote: >> I think this is true only for actual queries, right? I am not issuing >> any queries, only writes (document inserts). In the case of writes, >> increasing the number of shards should increase my throughput (in >> ops/sec) more or less linearly, right? > > No, that won't affect indexing speed all that much. The way to increase > indexing speed is to increase the number of processes or threads that > are indexing at the same time. Instead of having one client sending > update requests, try five of them. Also, index many documents with each > update request. Sending one document at a time is very inefficient. > > You didn't say how you're doing commits, but those need to be as > infrequent as you can manage. Ideally, you would use autoCommit with > openSearcher=false on an interval of about five minutes, and send an > explicit commit (with the default openSearcher=true) after all the > indexing is done. > > You may have requirements regarding document visibility that this won't > satisfy, but try to avoid doing commits with openSearcher=true (soft > commits qualify for this) extremely frequently, like once a second. > Once a minute is much more realistic. Opening a new searcher is an > expensive operation, especially if you have cache warming configured. > > Thanks, > Shawn >