Per - Wow, 1 trillion documents stored is pretty impressive. One clarification: when you say that you have 2 replica per collection on each machine, what exactly does that mean? Do you mean that each collection is sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards per machine)? Or are some of these slave replicas (e.g. 25x sharding with 1 replica per shard)?
Thanks! On Wed, Mar 25, 2015 at 5:13 AM, Per Steffensen <st...@designware.dk> wrote: > In one of our production environments we use 32GB, 4-core, 3T RAID0 > spinning disk Dell servers (do not remember the exact model). We have about > 25 collections with 2 replica (shard-instances) per collection on each > machine - 25 machines. Total of 25 coll * 2 replica/coll/machine * 25 > machines = 1250 replica. Each replica contains about 800 million pretty > small documents - thats about 1000 billion (do not know the english word > for it) documents all in all. We index about 1.5 billion new documents > every day (mainly into one of the collections = 50 replica across 25 > machine) and keep a history of 2 years on the data. Shifting the "index > into" collection every month. We can fairly easy keep up with the indexing > load. We have almost non of the data on the heap, but of course a small > fraction of the data in the files will at any time be in OS file-cache. > Compared to our indexing frequency we do not do a lot of searches. We have > about 10 users searching the system from time to time - anything from major > extracts to small quick searches. Depending on the nature of the search we > have response-times between 1 sec and 5 min. But of course that is very > dependent on "clever" choice on each field wrt index, store, doc-value etc. > BUT we are not using out-of-box Apache Solr. We have made quit a lot of > performance tweaks ourselves. > Please note that, even though you disable all Solr caches, each replica > will use heap-memory linearly dependent on the number of documents (and > their size) in that replica. But not much, so you can get pretty far with > relatively little RAM. > Our version of Solr is based on Apache Solr 4.4.0, but I expect/hope it > did not get worse in newer releases. > > Just to give you some idea of what can at least be achieved - in the > high-end of #replica and #docs, I guess > > Regards, Per Steffensen > > > On 24/03/15 14:02, Ian Rose wrote: > >> Hi all - >> >> I'm sure this topic has been covered before but I was unable to find any >> clear references online or in the mailing list. >> >> Are there any rules of thumb for how many cores (aka shards, since I am >> using SolrCloud) is "too many" for one machine? I realize there is no one >> answer (depends on size of the machine, etc.) so I'm just looking for a >> rough idea. Something like the following would be very useful: >> >> * People commonly run up to X cores/shards on a mid-sized (4 or 8 core) >> server without any problems. >> * I have never heard of anyone successfully running X cores/shards on a >> single machine, even if you throw a lot of hardware at it. >> >> Thanks! >> - Ian >> >> >