Imad Qureshi <imadgr...@yahoo.com.INVALID> wrote:
> I understand that but unfortunately that's not an option right now.
> We already have 16 TB of index in HDFS.
> 
> So let me rephrase this question. How important is data locality for
> SOLR. Is performance impacted if SOLR data is on a remote node?

The short answer is yes, the long answer is 
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Anecdotally we did some experiments prior to building our multi-TB search 
setup, where we compared local SSDs with remote (Isilon) SSDs. That setup was 
with simple searches and some faceting. I was a bit surprised that the slowdown 
was only 3x. I would expect the speed difference to be even smaller if the 
underlying storage is slow (spinning disks). Old blog post at 
https://sbdevel.wordpress.com/2013/12/06/danish-webscale/


I don't understand the expected gain of adding replicas, if the data are 
remote. Why can't the replica Solrs run on the nodes with the data? Do you have 
very CPU-intensive search?

- Toke Eskildsen

Reply via email to