Gaurav Patel <gaura...@gmail.com> wrote: > 3 Physical Machines with 60 cpu cores and 512 GB RAM each. > EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.
We have experimented a little bit with smaller machines, backed by EMC Isilon over NFS. That worked surprisingly well, but ultimately did not scale for us as we could not justify paying for enterprise SSDs for the Isilon. There is a write-up at https://sbdevel.wordpress.com/2013/12/06/danish-webscale/ > Can we use solr cloud for this setup? Yes. That is independent of the backing storage. > How many instances of SOLR are recommended per physical machines > and how much ram should be allocated to it. "That depends". http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ The amount of RAM for JVMs should be whatever is needed. Or to put it another way: There are some explicitly configured internal caches in Solr, but just setting Xmx to a very high number will not help performance. On the contrary, it will lead to long garbage collecting pauses and eat from the precious disk cache. There are some rules of thumb for running Solr, but my own meta rule of thumbs is that their applicability goes down when scale goes up. One of the rules of thumb is to have 1 Solr instance per machine. But running JVMs with very large heaps (100GB+) has the potential of extremely long garbage collection pauses and also implies a larger memory overhead due to internal pointer size. > Should zookeeper be installed along with solr on each box or should be > installed in separate 2 Virtual machines by itself? I have no opinion on that. > Can we run kakfa and cassandra along with solr on each physical machine? Sure, but they will of course compete with Solr for resources. > Anybody running Solr with HDFS in production? It is a recurring theme on this mailing list at least. It can be searched at https://www.mail-archive.com/solr-user@lucene.apache.org/ - Toke Eskildsen