On 7/29/2013 2:18 PM, Torsten Albrecht wrote:
we have

- 70 mio documents to 100 mio documents

and we want

- 800 requests per second


How many servers Amazon EC2/real hardware we Need for this?

Solr 4.x with solr cloud or better shards with loadbalancer?

Is anyone here who can give me some information, or who operates a similar 
system itself?

Your question is impossible to answer, aside from generalities that won't really help all that much.

I have a similarly sized system (82 million docs), but I don't have query volume anywhere near what yours is. I've got less than 10 queries per second. I have two copies of my index. I use a load balancer with traditional sharding.

I don't do replication, my two index copies are completely independent. I set it up this way long before SolrCloud was released. Having two completely independent indexes lets me do a lot of experimentation that a typical SolrCloud setup won't let me do.

One copy of the index is running 3.5.0 and is about 142GB if you add up all the shards. The other copy of the index is running 4.2.1 and is about 87GB on disk. Each copy of the index runs on two servers, six large cold shards and one small hot shard. Each of those servers has two quad-core processors (Xeon E5400 series, so fairly old now) and 64GB of RAM. I can get away with multiple shards per host because my query volume is so low.

Here's a screenshot of a status servlet that I wrote for my index. There's tons of info here about my index stats:

https://dl.dropboxusercontent.com/u/97770508/statuspagescreenshot.png

If I needed to start over from scratch with your higher query volume, I would probably set up two independent SolrCloud installs, each with a replicationFactor of at least two, and I'd use 4-8 shards. I would put a load balancer in front of it so that I could bring one cloud down and have everything still work, though with lower performance. Because of the query volume, I'd only have one shard per host. Depending on how big the index ended up being, I'd want 16-32GB (or possibly more) RAM per host.

You might not need the flexibility of two independent clouds, and it would require additional complexity in your indexing software. If you only went with one cloud, you'd just need a higher replicationFactor.

I'd also want to have another set of servers (not as beefy) to have another independent SolrCloud with a replicationFactor of 1 or 2 for dev purposes.

That's a LOT of hardware, and it would NOT be cheap. Can I be sure that you'd really need that much hardware? Not really. To to be quite honest, you'll just have to set up a proof-of-concept system and be prepared to make it bigger.

Thanks,
Shawn

  • solr sizing Torsten Albrecht
    • Re: solr sizing Shawn Heisey

Reply via email to