NP. My usual question though is "how often do you expect to lose a second ZK node before you can replace the first one that died?"
My tongue-in-cheek statement is often "If you're losing two nodes regularly, you have problems with your hardware that you're not really going to address by adding more ZK nodes" ;). And do note that even if you lose quorum, SolrCloud will continue to serve _queries_, albeit the "picture" each individual Solr node has of the current state of all the Solr nodes will get stale. You won't be able to index though. That said, the internal Solr load balancers auto-distribute queries anyway to live nodes, so things can limp along. As always, it's a tradeoff between expense/complexity and robustness though, and each and every situation is different in how much risk it can tolerate. FWIW, Erick On Thu, Jan 21, 2016 at 1:49 AM, Yago Riveiro <yago.rive...@gmail.com> wrote: > Is not a typo. I was wrong, for zookeeper 2 nodes still count as majority. > It's not the desirable configuration but is tolerable. > > > > Thanks Erick. > > > > \-- > > /Yago Riveiro > >> On Jan 21 2016, at 4:15 am, Erick Erickson <erickerick...@gmail.com> > wrote: > >> > >> bq: 3 are to risky, you lost one you lost quorum > >> > >> Typo? You need to lose two..... > >> > >> On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro <yago.rive...@gmail.com> > wrote: > > Our Zookeeper cluster is an ensemble of 5 machines, is a good starting > point, > > 3 are to risky, you lost one you lost quorum and with 7 sync cost > increase. > > > > > > > > ZK cluster is in machines without IO and rotative hdd (don't not use SDD > to > > gain IO performance, zookeeper is optimized to spinning disks). > > > > > > > > The ZK cluster behaves without problems, the first deploy of ZK was in > the > > same machines that the Solr Cluster (ZK log in its own hdd) and that > didn't > > wok very well, CPU and networking IO from Solr Cluster was too much. > > > > > > > > About schema modifications. > > > > Modify the schema to add new fields is relative simple with new API, in > the > > pass all the work was manually uploading the schema to ZK and reloading > all > > collections (indexing must be disable or timeouts and funny errors > happen). > > > > With the new Schema API this is more user friendly. Anyway, I stop > indexing > > and for reload the collections (I don't know if it's necessary nowadays). > > > > About Indexing data. > > > > > > > > We have self made data importer, it's not java and not performs batch > indexing > > (with 500 collections buffer data and build the batch is expensive and > > complicate for error handling). > > > > > > > > We use regular HTTP post in json. Our throughput is about 1000 docs/s > without > > any type of optimization. Some time we have issues with replication, the > slave > > can keep pace with leader insertion and a full sync is requested, this is > bad > > because sync the replica again implicates a lot of IO wait and CPU and > with > > replicas with 100G take an hour or more (normally when this happen, we > disable > > indexing to release IO and CPU and not kill the node with a load of 50 or > 60). > > > > In this department my advice is "keep it simple" in the end is an HTTP > POST to > > a node of the cluster. > > > > > > > > \\-- > > > > /Yago Riveiro > > > >> On Jan 20 2016, at 1:39 pm, Troy Edwards > &lt;tedwards415...@gmail.com&gt; > > wrote: > > > >> > > > >> Thank you for sharing your experiences/ideas. > > > >> > > > >> Yago since you have 8 billion documents over 500 collections, can you > share > > what/how you do index maintenance (e.g. add field)? And how are you > loading > > data into the index? Any experiences around how Zookeeper ensemble > behaves > > with so many collections? > > > >> > > > >> Best, > > > >> > > > >> > > On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro > &lt;yago.rive...@gmail.com&gt; > > wrote: > > > >> > > > >> &gt; What I can say is: > > &gt; > > &gt; > > &gt; * SDD (crucial for performance if the index doesn't fit in > memory, and > > &gt; will not fit) > > &gt; * Divide and conquer, for that volume of docs you will need more > than 6 > > &gt; nodes. > > &gt; * DocValues to not stress the java HEAP. > > &gt; * Do you will you aggregate data?, if yes, what is your max > > &gt; cardinality?, this question is the most important to size > correctly the > > &gt; memory needs. > > &gt; * Latency is important too, which threshold is acceptable before > > &gt; consider a query slow? > > &gt; At my company we are running a 12 terabytes (2 replicas) Solr > cluster > > with > > &gt; 8 > > &gt; billion documents sparse over 500 collection . For this we have > about 12 > > &gt; machines with SDDs and 32G of ram each (~24G for the heap). > > &gt; > > &gt; We don't have a strict need of speed, 30 second query to > aggregate 100 > > &gt; million > > &gt; documents with 1M of unique keys is fast enough for us, normally > the > > &gt; aggregation performance decrease as the number of unique keys > increase, > > &gt; with > > &gt; low unique key factor, queries take less than 2 seconds if data > is in OS > > &gt; cache. > > &gt; > > &gt; Personal recommendations: > > &gt; > > &gt; * Sharding is important and smart sharding is crucial, you don't > want > > &gt; run queries on data that is not interesting (this slow down > queries when > > &gt; the dataset is big). > > &gt; * If you want measure speed do it with about 1 billion documents > to > > &gt; simulate something real (real for 10 billion document world). > > &gt; * Index with re-indexing in mind. with 10 billion docs, re-index > data > > &gt; takes months ... This is important if you don't use regular > features of > > &gt; Solr. In my case I configured Docvalues with disk format (not > standard > > &gt; feature in 4.x) and at some point this format was deprecated. > Upgrade > > Solr > > &gt; to 5.x was an epic 3 months battle to do it without full > downtime. > > &gt; * Solr is like your girlfriend, will demand love and care and > plenty of > > &gt; space to full-recover replicas that in some point are out of > sync, happen > > a > > &gt; lot restarting nodes (this is annoying with replicas with 100G), > don't > > &gt; underestimate this point. Free space can save your life. > > &gt; > > &gt; \\\\-- > > &gt; > > &gt; /Yago Riveiro > > &gt; > > &gt; &gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey > > &amp;lt;apa...@elyograg.org&amp;gt; > > &gt; wrote: > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; On 1/19/2016 1:30 PM, Troy Edwards wrote: > > &gt; &amp;gt; We are currently "beta testing" a SolrCloud with 2 > nodes and 2 > > shards > > &gt; with > > &gt; &amp;gt; 2 replicas each. The number of documents is about > 125000. > > &gt; &amp;gt; > > &gt; &amp;gt; We now want to scale this to about 10 billion > documents. > > &gt; &amp;gt; > > &gt; &amp;gt; What are the steps to prototyping, hardware > estimation and > > stress > > &gt; testing? > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; There is no general information available for sizing, > because there > > are > > &gt; too many factors that will affect the answers. Some of the > important > > &gt; information that you need will be impossible to predict until > you > > &gt; actually build it and subject it to a real query load. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; https://lucidworks.com/blog/sizing-hardware-in-the- > abstract-why-we- > > dont- > > &gt; have-a-definitive-answer/ > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; With an index of 10 billion documents, you may not be > able to > > precisely > > &gt; predict performance and hardware requirements from a small-scale > > &gt; prototype. You'll likely need to build a full-scale system on a > small > > &gt; testbed, look for bottlenecks, ask for advice, and plan on a > larger > > &gt; system for production. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; The hard limit for documents on a single shard is > slightly less than > > &gt; Java's Integer.MAX_VALUE -- just over two billion. Because > deleted > > &gt; documents count against this max, about one billion documents > per shard > > &gt; is the absolute max that should be loaded in practice. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; BUT, if you actually try to put one billion documents > in a single > > &gt; server, performance will likely be awful. A more reasonable > limit per > > &gt; machine is 100 million ... but even this is quite large. You > might need > > &gt; smaller shards, or you might be able to get good performance > with larger > > &gt; shards. It all depends on things that you may not even know yet. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; Memory is always a strong driver for Solr performance, > and I am > > speaking > > &gt; specifically of OS disk cache -- memory that has not been > allocated by > > &gt; any program. With 10 billion documents, your total index size > will > > &gt; likely be hundreds of gigabytes, and might even reach terabyte > scale. > > &gt; Good performance with indexes this large will require a lot of > total > > &gt; memory, which probably means that you will need a lot of servers > and > > &gt; many shards. SSD storage is strongly recommended. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; For extreme scaling on Solr, especially if the query > rate will be > > high, > > &gt; it is recommended to only have one shard replica per server. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; I have just added an "extreme scaling" section to the > following wiki > > &gt; page, but it's mostly a placeholder right now. I would like to > have a > > &gt; discussion with people who operate very large indexes so I can > put real > > &gt; usable information in this section. I'm on IRC quite frequently > in the > > &gt; #solr channel. > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; https://wiki.apache.org/solr/SolrPerformanceProblems > > &gt; > > &gt; &gt; > > &gt; > > &gt; &gt; Thanks, > > &gt; Shawn > > &gt; > > &gt; > > >