Re: Scaling SolrCloud

Erick Erickson Wed, 20 Jan 2016 20:15:36 -0800

bq: 3 are to risky, you lost one you lost quorum

Typo? You need to lose two.....


On Wed, Jan 20, 2016 at 6:25 AM, Yago Riveiro <yago.rive...@gmail.com> wrote:
> Our Zookeeper cluster is an ensemble of 5 machines, is a good starting point,
> 3 are to risky, you lost one you lost quorum and with 7 sync cost increase.
>
>
>
> ZK cluster is in machines without IO and rotative hdd (don't not use SDD to
> gain IO performance,  zookeeper is optimized to spinning disks).
>
>
>
> The ZK cluster behaves without problems, the first deploy of ZK was in the
> same machines that the Solr Cluster (ZK log in its own hdd) and that didn't
> wok very well, CPU and networking IO from Solr Cluster was too much.
>
>
>
> About schema modifications.
>
> Modify the schema to add new fields is relative simple with new API, in the
> pass all the work was manually uploading the schema to ZK and reloading all
> collections (indexing must be disable or timeouts and funny errors happen).
>
> With the new Schema API this is more user friendly. Anyway, I stop indexing
> and for reload the collections (I don't know if it's necessary nowadays).
>
> About Indexing data.
>
>
>
> We have self made data importer, it's not java and not performs batch indexing
> (with 500 collections buffer data and build the batch is expensive and
> complicate for error handling).
>
>
>
> We use regular HTTP post in json. Our throughput  is about 1000 docs/s without
> any type of optimization. Some time we have issues with replication, the slave
> can keep pace with leader insertion and a full sync is requested, this is bad
> because sync the replica again implicates a lot of IO wait and CPU and with
> replicas with 100G take an hour or more (normally when this happen, we disable
> indexing to release IO and CPU and not kill the node with a load of 50 or 60).
>
> In this department my advice is "keep it simple" in the end is an HTTP POST to
> a node of the cluster.
>
>
>
> \--
>
> /Yago Riveiro
>
>> On Jan 20 2016, at 1:39 pm, Troy Edwards &lt;tedwards415...@gmail.com&gt;
> wrote:
>
>>
>
>> Thank you for sharing your experiences/ideas.
>
>>
>
>> Yago since you have 8 billion documents over 500 collections, can you share
> what/how you do index maintenance (e.g. add field)? And how are you loading
> data into the index? Any experiences around how Zookeeper ensemble behaves
> with so many collections?
>
>>
>
>> Best,
>
>>
>
>>
> On Tue, Jan 19, 2016 at 6:05 PM, Yago Riveiro &lt;yago.rive...@gmail.com&gt;
> wrote:
>
>>
>
>> &gt; What I can say is:
> &gt;
> &gt;
> &gt; * SDD (crucial for performance if the index doesn't fit in memory, and
> &gt; will not fit)
> &gt; * Divide and conquer, for that volume of docs you will need more than 6
> &gt; nodes.
> &gt; * DocValues to not stress the java HEAP.
> &gt; * Do you will you aggregate data?, if yes, what is your max
> &gt; cardinality?, this question is the most important to size correctly the
> &gt; memory needs.
> &gt; * Latency is important too, which threshold is acceptable before
> &gt; consider a query slow?
> &gt; At my company we are running a 12 terabytes (2 replicas) Solr cluster
> with
> &gt; 8
> &gt; billion documents sparse over 500 collection . For this we have about 12
> &gt; machines with SDDs and 32G of ram each (~24G for the heap).
> &gt;
> &gt; We don't have a strict need of speed, 30 second query to aggregate 100
> &gt; million
> &gt; documents with 1M of unique keys is fast enough for us, normally the
> &gt; aggregation performance decrease as the number of unique keys increase,
> &gt; with
> &gt; low unique key factor, queries take less than 2 seconds if data is in OS
> &gt; cache.
> &gt;
> &gt; Personal recommendations:
> &gt;
> &gt; * Sharding is important and smart sharding is crucial, you don't want
> &gt; run queries on data that is not interesting (this slow down queries when
> &gt; the dataset is big).
> &gt; * If you want measure speed do it with about 1 billion documents to
> &gt; simulate something real (real for 10 billion document world).
> &gt; * Index with re-indexing in mind. with 10 billion docs, re-index data
> &gt; takes months ... This is important if you don't use regular features of
> &gt; Solr. In my case I configured Docvalues with disk format (not standard
> &gt; feature in 4.x) and at some point this format was deprecated. Upgrade
> Solr
> &gt; to 5.x was an epic 3 months battle to do it without full downtime.
> &gt; * Solr is like your girlfriend, will demand love and care and plenty of
> &gt; space to full-recover replicas that in some point are out of sync, happen
> a
> &gt; lot restarting nodes (this is annoying with replicas with 100G), don't
> &gt; underestimate this point. Free space can save your life.
> &gt;
> &gt; \\--
> &gt;
> &gt; /Yago Riveiro
> &gt;
> &gt; &gt; On Jan 19 2016, at 11:26 pm, Shawn Heisey
> &amp;lt;apa...@elyograg.org&amp;gt;
> &gt; wrote:
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; On 1/19/2016 1:30 PM, Troy Edwards wrote:
> &gt; &amp;gt; We are currently "beta testing" a SolrCloud with 2 nodes and 2
> shards
> &gt; with
> &gt; &amp;gt; 2 replicas each. The number of documents is about 125000.
> &gt; &amp;gt;
> &gt; &amp;gt; We now want to scale this to about 10 billion documents.
> &gt; &amp;gt;
> &gt; &amp;gt; What are the steps to prototyping, hardware estimation and
> stress
> &gt; testing?
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; There is no general information available for sizing, because there
> are
> &gt; too many factors that will affect the answers. Some of the important
> &gt; information that you need will be impossible to predict until you
> &gt; actually build it and subject it to a real query load.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-
> dont-
> &gt; have-a-definitive-answer/
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; With an index of 10 billion documents, you may not be able to
> precisely
> &gt; predict performance and hardware requirements from a small-scale
> &gt; prototype. You'll likely need to build a full-scale system on a small
> &gt; testbed, look for bottlenecks, ask for advice, and plan on a larger
> &gt; system for production.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; The hard limit for documents on a single shard is slightly less than
> &gt; Java's Integer.MAX_VALUE -- just over two billion. Because deleted
> &gt; documents count against this max, about one billion documents per shard
> &gt; is the absolute max that should be loaded in practice.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; BUT, if you actually try to put one billion documents in a single
> &gt; server, performance will likely be awful. A more reasonable limit per
> &gt; machine is 100 million ... but even this is quite large. You might need
> &gt; smaller shards, or you might be able to get good performance with larger
> &gt; shards. It all depends on things that you may not even know yet.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; Memory is always a strong driver for Solr performance, and I am
> speaking
> &gt; specifically of OS disk cache -- memory that has not been allocated by
> &gt; any program. With 10 billion documents, your total index size will
> &gt; likely be hundreds of gigabytes, and might even reach terabyte scale.
> &gt; Good performance with indexes this large will require a lot of total
> &gt; memory, which probably means that you will need a lot of servers and
> &gt; many shards. SSD storage is strongly recommended.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; For extreme scaling on Solr, especially if the query rate will be
> high,
> &gt; it is recommended to only have one shard replica per server.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; I have just added an "extreme scaling" section to the following wiki
> &gt; page, but it's mostly a placeholder right now. I would like to have a
> &gt; discussion with people who operate very large indexes so I can put real
> &gt; usable information in this section. I'm on IRC quite frequently in the
> &gt; #solr channel.
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; https://wiki.apache.org/solr/SolrPerformanceProblems
> &gt;
> &gt; &gt;
> &gt;
> &gt; &gt; Thanks,
> &gt; Shawn
> &gt;
> &gt;
>

Re: Scaling SolrCloud

Reply via email to