Hi
I have some experience with practical limits. We have several setup we
have tried to run with high load for long time:
1)
* 20 shards in one collection spread over 5 nodes (4 shards for the
collection per node), no redunancdy (only one replica per shard)
* Indexing 35-50 mio documents per day and searching a little along the way
* We do not have detailed measurements on searching, but my impression
is that search response times are fairly ok (below 5 secs for
non-complicated searches) - at least the first 15 days, up to about 500
mio documents
* We have very detailed measurements on indexing times though. They are
good the first 15-17 days, up to 500-600 mio documents. Then we see a
temporary slow-down in indexing times. This is because major merges
happen at the same time across all shards. The indexing times speed up
when this is over, though. After about 20 days everything stops running
- things just get too slow and eventually nothing happens.
2)
* Same as 1), except 40 shards in one collection spread over 10 nodes,
no redundancy
* Slowdown points seems to change linearly - slow-down around 1 billion
docs and complete stop 1.3-1.4 billion docs
Therefore it seems a little strange to me that you have problems with 25
mio docs in two shards.
One major difference is the redundancy, though. We are having only one
replica per shard. We started our trying to run with redundancy (2
replica per shard) but that involved a lot of problems. Things never
successfully recover when recover situations occur, and we see like
4-times indexing times compared to non-redundancy (even though a max of
2-times should be expected).
Regards, Per Steffensen
On 1/7/13 6:14 PM, f.fourna...@gibmedia.fr wrote:
Hello,
I'm new in SOLR and I've a collection with 25 millions of records.
I want to run this collection on SOLR Cloud (sorl 4.0) under Amazon EC2
instances.
Currently I've configured 2 shards and 2 replica per shard with Medium
instances (4Go, 1 CPU core) and response times are very long.
How to size the cloud (sharding, replica, memory, CPU,...) to have
acceptable response times in my situation? more memory ? more cpu ? more
shards ? Does rules to size a solr cloud exists ?
Is it possible to have more than 2 replicas on one shard ? is it relevant ?
Best regards
FF