Yes, I was inadvertently sending them to a replica. When I sent them to the leader, the leader reported (1000 adds) and the replica reported only 1 add per document. So, it looks like the leader forwards the batched jobs individually to the replicas.
On Fri, Oct 31, 2014 at 3:26 PM, Erick Erickson <erickerick...@gmail.com> wrote: > Internally, the docs are batched up into smaller buckets (10 as I > remember) and forwarded to the correct shard leader. I suspect that's > what you're seeing. > > Erick > > On Fri, Oct 31, 2014 at 12:20 PM, Peter Keegan <peterlkee...@gmail.com> > wrote: > > Regarding batch indexing: > > When I send batches of 1000 docs to a standalone Solr server, the log > file > > reports "(1000 adds)" in LogUpdateProcessor. But when I send them to the > > leader of a replicated index, the leader log file reports much smaller > > numbers, usually "(12 adds)". Why do the batches appear to be broken up? > > > > Peter > > > > On Fri, Oct 31, 2014 at 10:40 AM, Erick Erickson < > erickerick...@gmail.com> > > wrote: > > > >> NP, just making sure. > >> > >> I suspect you'll get lots more bang for the buck, and > >> results much more closely matching your expectations if > >> > >> 1> you batch up a bunch of docs at once rather than > >> sending them one at a time. That's probably the easiest > >> thing to try. Sending docs one at a time is something of > >> an anti-pattern. I usually start with batches of 1,000. > >> > >> And just to check.. You're not issuing any commits from the > >> client, right? Performance will be terrible if you issue commits > >> after every doc, that's totally an anti-pattern. Doubly so for > >> optimizes.... Since you showed us your solrconfig autocommit > >> settings I'm assuming not but want to be sure. > >> > >> 2> use a leader-aware client. I'm totally unfamiliar with Go, > >> so I have no suggestions whatsoever to offer there.... But you'll > >> want to batch in this case too. > >> > >> On Fri, Oct 31, 2014 at 5:51 AM, Ian Rose <ianr...@fullstory.com> > wrote: > >> > Hi Erick - > >> > > >> > Thanks for the detailed response and apologies for my confusing > >> > terminology. I should have said "WPS" (writes per second) instead of > QPS > >> > but I didn't want to introduce a weird new acronym since QPS is well > >> > known. Clearly a bad decision on my part. To clarify: I am doing > >> > *only* writes > >> > (document adds). Whenever I wrote "QPS" I was referring to writes. > >> > > >> > It seems clear at this point that I should wrap up the code to do > "smart" > >> > routing rather than choose Solr nodes randomly. And then see if that > >> > changes things. I must admit that although I understand that random > node > >> > selection will impose a performance hit, theoretically it seems to me > >> that > >> > the system should still scale up as you add more nodes (albeit at > lower > >> > absolute level of performance than if you used a smart router). > >> > Nonetheless, I'm just theorycrafting here so the better thing to do is > >> just > >> > try it experimentally. I hope to have that working today - will > report > >> > back on my findings. > >> > > >> > Cheers, > >> > - Ian > >> > > >> > p.s. To clarify why we are rolling our own smart router code, we use > Go > >> > over here rather than Java. Although if we still get bad performance > >> with > >> > our custom Go router I may try a pure Java load client using > >> > CloudSolrServer to eliminate the possibility of bugs in our > >> implementation. > >> > > >> > > >> > On Fri, Oct 31, 2014 at 1:37 AM, Erick Erickson < > erickerick...@gmail.com > >> > > >> > wrote: > >> > > >> >> I'm really confused: > >> >> > >> >> bq: I am not issuing any queries, only writes (document inserts) > >> >> > >> >> bq: It's clear that once the load test client has ~40 simulated users > >> >> > >> >> bq: A cluster of 3 shards over 3 Solr nodes *should* support > >> >> a higher QPS than 2 shards over 2 Solr nodes, right > >> >> > >> >> QPS is usually used to mean "Queries Per Second", which is different > >> from > >> >> the statement that "I am not issuing any queries....". And what do > the > >> >> number of users have to do with inserting documents? > >> >> > >> >> You also state: " In many cases, CPU on the solr servers is quite > low as > >> >> well" > >> >> > >> >> So let's talk about indexing first. Indexing should scale nearly > >> >> linearly as long as > >> >> 1> you are routing your docs to the correct leader, which happens > with > >> >> SolrJ > >> >> and the CloudSolrSever automatically. Rather than rolling your own, I > >> >> strongly > >> >> suggest you try this out. > >> >> 2> you have enough clients feeding the cluster to push CPU > utilization > >> >> on them all. > >> >> Very often "slow indexing", or in your case "lack of scaling" is a > >> >> result of document > >> >> acquisition or, in your case, your doc generator is spending all it's > >> >> time waiting for > >> >> the individual documents to get to Solr and come back. > >> >> > >> >> bq: "chooses a random solr server for each ADD request (with 1 doc > per > >> add > >> >> request)" > >> >> > >> >> Probably your culprit right there. Each and every document requires > that > >> >> you > >> >> have to cross the network (and forward that doc to the correct > leader). > >> So > >> >> given > >> >> that you're not seeing high CPU utilization, I suspect that you're > not > >> >> sending > >> >> enough docs to SolrCloud fast enough to see scaling. You need to > batch > >> up > >> >> multiple docs, I generally send 1,000 docs at a time. > >> >> > >> >> But even if you do solve this, the inter-node routing will prevent > >> >> linear scaling. > >> >> When a doc (or a batch of docs) goes to a random Solr node, here's > what > >> >> happens: > >> >> 1> the docs are re-packaged into groups based on which shard they're > >> >> destined for > >> >> 2> the sub-packets are forwarded to the leader for each shard > >> >> 3> the responses are gathered back and returned to the client. > >> >> > >> >> This set of operations will eventually degrade the scaling. > >> >> > >> >> bq: A cluster of 3 shards over 3 Solr nodes *should* support > >> >> a higher QPS than 2 shards over 2 Solr nodes, right? That's the > whole > >> idea > >> >> behind sharding. > >> >> > >> >> If we're talking search requests, the answer is no. Sharding is > >> >> what you do when your collection no longer fits on a single node. > >> >> If it _does_ fit on a single node, then you'll usually get better > query > >> >> performance by adding a bunch of replicas to a single shard. When > >> >> the number of docs on each shard grows large enough that you > >> >> no longer get good query performance, _then_ you shard. And > >> >> take the query hit. > >> >> > >> >> If we're talking about inserts, then see above. I suspect your > problem > >> is > >> >> that you're _not_ "saturating the SolrCloud cluster", you're sending > >> >> docs to Solr very inefficiently and waiting on I/O. Batching docs and > >> >> sending them to the right leader should scale pretty linearly until > you > >> >> start saturating your network. > >> >> > >> >> Best, > >> >> Erick > >> >> > >> >> On Thu, Oct 30, 2014 at 6:56 PM, Ian Rose <ianr...@fullstory.com> > >> wrote: > >> >> > Thanks for the suggestions so for, all. > >> >> > > >> >> > 1) We are not using SolrJ on the client (not using Java at all) > but I > >> am > >> >> > working on writing a "smart" router so that we can always send to > the > >> >> > correct node. I am certainly curious to see how that changes > things. > >> >> > Nonetheless even with the overhead of extra routing hops, the > observed > >> >> > behavior (no increase in performance with more nodes) doesn't make > any > >> >> > sense to me. > >> >> > > >> >> > 2) Commits: we are using autoCommit with openSearcher=false > >> >> (maxTime=60000) > >> >> > and autoSoftCommit (maxTime=15000). > >> >> > > >> >> > 3) Suggestions to batch documents certainly make sense for > production > >> >> code > >> >> > but in this case I am not real concerned with absolute > performance; I > >> >> just > >> >> > want to see the *relative* performance as we use more Solr nodes. > So > >> I > >> >> > don't think batching or not really matters. > >> >> > > >> >> > 4) "No, that won't affect indexing speed all that much. The way to > >> >> > increase indexing speed is to increase the number of processes or > >> threads > >> >> > that are indexing at the same time. Instead of having one client > >> >> > sending update > >> >> > requests, try five of them." > >> >> > > >> >> > Can you elaborate on this some? I'm worried I might be > >> misunderstanding > >> >> > something fundamental. A cluster of 3 shards over 3 Solr nodes > >> >> > *should* support > >> >> > a higher QPS than 2 shards over 2 Solr nodes, right? That's the > whole > >> >> idea > >> >> > behind sharding. Regarding your comment of "increase the number of > >> >> > processes or threads", note that for each value of K (number of > Solr > >> >> nodes) > >> >> > I measured with several different numbers of simulated users so > that I > >> >> > could find a "saturation point". For example, take a look at my > data > >> for > >> >> > K=2: > >> >> > > >> >> > Num NodesNum > >> >> UsersQPS214722517902102290215285022029002403210260320028032102 > >> >> > 1003180 > >> >> > > >> >> > It's clear that once the load test client has ~40 simulated users, > the > >> >> Solr > >> >> > cluster is saturated. Creating more users just increases the > average > >> >> > request latency, such that the total QPS remained (nearly) > constant. > >> So > >> >> I > >> >> > feel pretty confident that a cluster of size 2 *maxes out* at ~3200 > >> qps. > >> >> > The problem is that I am finding roughly this same "max point", no > >> matter > >> >> > how many simulated users the load test client created, for any > value > >> of K > >> >> > (> 1). > >> >> > > >> >> > Cheers, > >> >> > - Ian > >> >> > > >> >> > > >> >> > On Thu, Oct 30, 2014 at 8:01 PM, Erick Erickson < > >> erickerick...@gmail.com > >> >> > > >> >> > wrote: > >> >> > > >> >> >> Your indexing client, if written in SolrJ, should use > CloudSolrServer > >> >> >> which is, in Matt's terms "leader aware". It divides up the > >> >> >> documents to be indexed into packets that where each doc in > >> >> >> the packet belongs on the same shard, and then sends the packet > >> >> >> to the shard leader. This avoids a lot of re-routing and should > >> >> >> scale essentially linearly. You may have to add more clients > >> >> >> though, depending upon who hard the document-generator is > >> >> >> working. > >> >> >> > >> >> >> Also, make sure that you send batches of documents as Shawn > >> >> >> suggests, I use 1,000 as a starting point. > >> >> >> > >> >> >> Best, > >> >> >> Erick > >> >> >> > >> >> >> On Thu, Oct 30, 2014 at 2:10 PM, Shawn Heisey < > apa...@elyograg.org> > >> >> wrote: > >> >> >> > On 10/30/2014 2:56 PM, Ian Rose wrote: > >> >> >> >> I think this is true only for actual queries, right? I am not > >> issuing > >> >> >> >> any queries, only writes (document inserts). In the case of > >> writes, > >> >> >> >> increasing the number of shards should increase my throughput > (in > >> >> >> >> ops/sec) more or less linearly, right? > >> >> >> > > >> >> >> > No, that won't affect indexing speed all that much. The way to > >> >> increase > >> >> >> > indexing speed is to increase the number of processes or threads > >> that > >> >> >> > are indexing at the same time. Instead of having one client > >> sending > >> >> >> > update requests, try five of them. Also, index many documents > with > >> >> each > >> >> >> > update request. Sending one document at a time is very > >> inefficient. > >> >> >> > > >> >> >> > You didn't say how you're doing commits, but those need to be as > >> >> >> > infrequent as you can manage. Ideally, you would use autoCommit > >> with > >> >> >> > openSearcher=false on an interval of about five minutes, and > send > >> an > >> >> >> > explicit commit (with the default openSearcher=true) after all > the > >> >> >> > indexing is done. > >> >> >> > > >> >> >> > You may have requirements regarding document visibility that > this > >> >> won't > >> >> >> > satisfy, but try to avoid doing commits with openSearcher=true > >> (soft > >> >> >> > commits qualify for this) extremely frequently, like once a > second. > >> >> >> > Once a minute is much more realistic. Opening a new searcher > is an > >> >> >> > expensive operation, especially if you have cache warming > >> configured. > >> >> >> > > >> >> >> > Thanks, > >> >> >> > Shawn > >> >> >> > > >> >> >> > >> >> > >> >