On 7/29/2013 2:18 PM, Torsten Albrecht wrote:
we have
- 70 mio documents to 100 mio documents
and we want
- 800 requests per second
How many servers Amazon EC2/real hardware we Need for this?
Solr 4.x with solr cloud or better shards with loadbalancer?
Is anyone here who can give me some information, or who operates a similar
system itself?
Your question is impossible to answer, aside from generalities that
won't really help all that much.
I have a similarly sized system (82 million docs), but I don't have
query volume anywhere near what yours is. I've got less than 10 queries
per second. I have two copies of my index. I use a load balancer with
traditional sharding.
I don't do replication, my two index copies are completely independent.
I set it up this way long before SolrCloud was released. Having two
completely independent indexes lets me do a lot of experimentation that
a typical SolrCloud setup won't let me do.
One copy of the index is running 3.5.0 and is about 142GB if you add up
all the shards. The other copy of the index is running 4.2.1 and is
about 87GB on disk. Each copy of the index runs on two servers, six
large cold shards and one small hot shard. Each of those servers has
two quad-core processors (Xeon E5400 series, so fairly old now) and 64GB
of RAM. I can get away with multiple shards per host because my query
volume is so low.
Here's a screenshot of a status servlet that I wrote for my index.
There's tons of info here about my index stats:
https://dl.dropboxusercontent.com/u/97770508/statuspagescreenshot.png
If I needed to start over from scratch with your higher query volume, I
would probably set up two independent SolrCloud installs, each with a
replicationFactor of at least two, and I'd use 4-8 shards. I would put
a load balancer in front of it so that I could bring one cloud down and
have everything still work, though with lower performance. Because of
the query volume, I'd only have one shard per host. Depending on how
big the index ended up being, I'd want 16-32GB (or possibly more) RAM
per host.
You might not need the flexibility of two independent clouds, and it
would require additional complexity in your indexing software. If you
only went with one cloud, you'd just need a higher replicationFactor.
I'd also want to have another set of servers (not as beefy) to have
another independent SolrCloud with a replicationFactor of 1 or 2 for dev
purposes.
That's a LOT of hardware, and it would NOT be cheap. Can I be sure that
you'd really need that much hardware? Not really. To to be quite
honest, you'll just have to set up a proof-of-concept system and be
prepared to make it bigger.
Thanks,
Shawn