In any case, this is really "the sizing question" and generic answers are not reliable. Here's a long blog about why, but the net-net is "prototype and measure". Fortunately you can prototype with just a few nodes (I usually want at least 2 shards) and extrapolate reasonably well.
https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Fri, Jan 13, 2017 at 10:29 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > As per Scott@FullStory you shall see benefits with many smaller shards then > few bigger. Also upgrading to Solr 6.2 would be better as there are many > improvements done handling multiple shards. See below presentation > > http://www.slideshare.net/lucidworks/large-scale-solr-at-fullstory-presented-by-scott-blum-fullstory > > > Thnx > Susheel > > On Fri, Jan 13, 2017 at 12:56 PM, Joe Obernberger < > joseph.obernber...@gmail.com> wrote: > >> Hi All - we've been experimenting with Solr Cloud 5.5.0 with a 27 shard >> (no replication - each shard runs on a physical host) cluster on top of >> HDFS. It currently just crossed 3 billion documents indexed with an index >> size of 16.1TBytes. In HDFS with 3x replication this takes up 48.2TBytes. >> >> Each shard is then hosting about 610GBytes of index. The HDFS cache size >> is very low at about 8GBytes. Suffice it to say, performance isn't very >> good, but again, this is for experimentation. >> >> If we were to redo this, would it be better to create many shards - maybe >> 200 with 3 replicas each (600 in all) with the goal being to withstand a >> server going out, and future expansion as more hardware is added? I know >> this is very general question. Thanks very much in advance! >> >> -Joe >> >>