Hi

I have some experience with practical limits. We have several setup we have tried to run with high load for long time:
1)
* 20 shards in one collection spread over 5 nodes (4 shards for the collection per node), no redunancdy (only one replica per shard)
* Indexing 35-50 mio documents per day and searching a little along the way
* We do not have detailed measurements on searching, but my impression is that search response times are fairly ok (below 5 secs for non-complicated searches) - at least the first 15 days, up to about 500 mio documents * We have very detailed measurements on indexing times though. They are good the first 15-17 days, up to 500-600 mio documents. Then we see a temporary slow-down in indexing times. This is because major merges happen at the same time across all shards. The indexing times speed up when this is over, though. After about 20 days everything stops running - things just get too slow and eventually nothing happens.
2)
* Same as 1), except 40 shards in one collection spread over 10 nodes, no redundancy * Slowdown points seems to change linearly - slow-down around 1 billion docs and complete stop 1.3-1.4 billion docs

Therefore it seems a little strange to me that you have problems with 25 mio docs in two shards. One major difference is the redundancy, though. We are having only one replica per shard. We started our trying to run with redundancy (2 replica per shard) but that involved a lot of problems. Things never successfully recover when recover situations occur, and we see like 4-times indexing times compared to non-redundancy (even though a max of 2-times should be expected).

Regards, Per Steffensen


On 1/7/13 6:14 PM, f.fourna...@gibmedia.fr wrote:
Hello,
I'm new in SOLR and I've a collection with 25 millions of records.
I want to run this collection on SOLR Cloud (sorl 4.0) under Amazon EC2
instances.
Currently I've configured 2 shards and 2 replica per shard with Medium
instances (4Go, 1 CPU core) and response times are very long.
How to size the cloud (sharding, replica, memory, CPU,...) to have
acceptable response times in my situation? more memory ? more cpu ? more
shards ? Does rules to size a solr cloud exists ?
Is it possible to have more than 2 replicas on one shard ? is it relevant ?
Best regards
FF


Reply via email to