Per - Wow, 1 trillion documents stored is pretty impressive.  One
clarification: when you say that you have 2 replica per collection on each
machine, what exactly does that mean?  Do you mean that each collection is
sharded into 50 shards, divided evenly over all 25 machines (thus 2 shards
per machine)?  Or are some of these slave replicas (e.g. 25x sharding with
1 replica per shard)?

Thanks!

On Wed, Mar 25, 2015 at 5:13 AM, Per Steffensen <st...@designware.dk> wrote:

> In one of our production environments we use 32GB, 4-core, 3T RAID0
> spinning disk Dell servers (do not remember the exact model). We have about
> 25 collections with 2 replica (shard-instances) per collection on each
> machine - 25 machines. Total of 25 coll * 2 replica/coll/machine * 25
> machines = 1250 replica. Each replica contains about 800 million pretty
> small documents - thats about 1000 billion (do not know the english word
> for it) documents all in all. We index about 1.5 billion new documents
> every day (mainly into one of the collections = 50 replica across 25
> machine) and keep a history of 2 years on the data. Shifting the "index
> into" collection every month. We can fairly easy keep up with the indexing
> load. We have almost non of the data on the heap, but of course a small
> fraction of the data in the files will at any time be in OS file-cache.
> Compared to our indexing frequency we do not do a lot of searches. We have
> about 10 users searching the system from time to time - anything from major
> extracts to small quick searches. Depending on the nature of the search we
> have response-times between 1 sec and 5 min. But of course that is very
> dependent on "clever" choice on each field wrt index, store, doc-value etc.
> BUT we are not using out-of-box Apache Solr. We have made quit a lot of
> performance tweaks ourselves.
> Please note that, even though you disable all Solr caches, each replica
> will use heap-memory linearly dependent on the number of documents (and
> their size) in that replica. But not much, so you can get pretty far with
> relatively little RAM.
> Our version of Solr is based on Apache Solr 4.4.0, but I expect/hope it
> did not get worse in newer releases.
>
> Just to give you some idea of what can at least be achieved - in the
> high-end of #replica and #docs, I guess
>
> Regards, Per Steffensen
>
>
> On 24/03/15 14:02, Ian Rose wrote:
>
>> Hi all -
>>
>> I'm sure this topic has been covered before but I was unable to find any
>> clear references online or in the mailing list.
>>
>> Are there any rules of thumb for how many cores (aka shards, since I am
>> using SolrCloud) is "too many" for one machine?  I realize there is no one
>> answer (depends on size of the machine, etc.) so I'm just looking for a
>> rough idea.  Something like the following would be very useful:
>>
>> * People commonly run up to X cores/shards on a mid-sized (4 or 8 core)
>> server without any problems.
>> * I have never heard of anyone successfully running X cores/shards on a
>> single machine, even if you throw a lot of hardware at it.
>>
>> Thanks!
>> - Ian
>>
>>
>

Reply via email to