Imad Qureshi <imadgr...@yahoo.com.INVALID> wrote: > I understand that but unfortunately that's not an option right now. > We already have 16 TB of index in HDFS. > > So let me rephrase this question. How important is data locality for > SOLR. Is performance impacted if SOLR data is on a remote node?
The short answer is yes, the long answer is https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Anecdotally we did some experiments prior to building our multi-TB search setup, where we compared local SSDs with remote (Isilon) SSDs. That setup was with simple searches and some faceting. I was a bit surprised that the slowdown was only 3x. I would expect the speed difference to be even smaller if the underlying storage is slow (spinning disks). Old blog post at https://sbdevel.wordpress.com/2013/12/06/danish-webscale/ I don't understand the expected gain of adding replicas, if the data are remote. Why can't the replica Solrs run on the nodes with the data? Do you have very CPU-intensive search? - Toke Eskildsen