Gaurav Patel <gaura...@gmail.com> wrote:
> 3 Physical Machines with 60 cpu cores and 512 GB RAM each.
> EMC Isilon Appliance with PB storage. It can be accessed via HDFS or NFS.

We have experimented a little bit with smaller machines, backed by EMC Isilon 
over NFS. That worked surprisingly well, but ultimately did not scale for us as 
we could not justify paying for enterprise SSDs for the Isilon. There is a 
write-up at https://sbdevel.wordpress.com/2013/12/06/danish-webscale/

> Can we use solr cloud for this setup?

Yes. That is independent of the backing storage.

> How many instances of SOLR are recommended per physical machines
> and how much ram should be allocated to it.

"That depends".
http://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The amount of RAM for JVMs should be whatever is needed. Or to put it another 
way: There are some explicitly configured internal caches in Solr, but just 
setting Xmx to a very high number will not help performance. On the contrary, 
it will lead to long garbage collecting pauses and eat from the precious disk 
cache.

There are some rules of thumb for running Solr, but my own meta rule of thumbs 
is that their applicability goes down when scale goes up. One of the rules of 
thumb is to have 1 Solr instance per machine. But running JVMs with very large 
heaps (100GB+) has the potential of extremely long garbage collection pauses 
and also implies a larger memory overhead due to internal pointer size.

> Should zookeeper be installed along with solr on each box or should be
> installed in separate 2 Virtual machines by itself?

I have no opinion on that.

> Can we run kakfa and cassandra along with solr on each physical machine?

Sure, but they will of course compete with Solr for resources.

> Anybody running Solr with HDFS in production?

It is a recurring theme on this mailing list at least. It can be searched at
https://www.mail-archive.com/solr-user@lucene.apache.org/

- Toke Eskildsen

Reply via email to