Re: SolrCloud loadbalancing, replication, and failover

Shawn Heisey Thu, 18 Apr 2013 21:51:56 -0700

On 4/18/2013 8:12 PM, David Parks wrote:
> I think I still don't understand something here. 
> 
> My concern right now is that query times are very slow for 120GB index (14s
> on avg), I've seen a lot of disk activity when running queries.
> 
> I'm hoping that distributing that query across 2 servers is going to improve
> the query time, specifically I'm hoping that we can distribute that disk
> activity because we don't have great disks on there (yet).
> 
> So, with disk IO being a factor in mind, running the query on one box, vs.
> across 2 *should* be a concern right?
> 
> Admittedly, this is the first step in what will probably be many to try to
> work our query times down from 14s to what I want to be around 1s.


I went through my mailing list archive to see what all you've said about
your setup.  One thing that I can't seem to find is a mention of how
much total RAM is in each of your servers.  I apologize if it was
actually there and I overlooked it.

In one email thread, you wanted to know whether Solr is CPU-bound or
IO-bound.  Solr is heavily reliant on the index on disk, and disk I/O is
the slowest piece of the puzzle. The way to get good performance out of
Solr is to have enough memory that you can take the disk mostly out of
the equation by having the operating system cache the index in RAM.  If
you don't have enough RAM for that, then Solr becomes IO-bound, and your
CPUs will be busy in iowait, unable to do much real work.  If you DO
have enough RAM to cache all (or most) of your index, then Solr will be
CPU-bound.

With 120GB of total index data on each server, you would want at least
128GB of RAM per server, assuming you are only giving 8-16GB of RAM to
Solr, and that Solr is the only thing running on the machine.  If you
have more servers and shards, you can reduce the per-server memory
requirement because the amount of index data on each server would go
down.  I am aware of the cost associated with this kind of requirement -
each of my Solr servers has 64GB.

If you are sharing the server with another program, then you want to
have enough RAM available for Solr's heap, Solr's data, the other
program's heap, and the other program's data.  Some programs (like
MySQL) completely skip the OS disk cache and instead do that caching
themselves with heap memory that's actually allocated to the program.
If you're using a program like that, then you wouldn't need to count its
data.

Using SSDs for storage can speed things up dramatically and may reduce
the total memory requirement to some degree, but even an SSD is slower
than RAM.  The transfer speed of RAM is faster, and from what I
understand, the latency is at least an order of magnitude quicker -
nanoseconds vs microseconds.

In another thread, you asked about how Google gets such good response
times.  Although Google's software probably works differently than
Solr/Lucene, when it comes right down to it, all search engines do
similar jobs and have similar requirements.  I would imagine that Google
gets incredible response time because they have incredible amounts of
RAM at their disposal that keep the important bits of their index
instantly available.  They have thousands of servers in each data
center.  I once got a look at the extent of Google's hardware in one
data center - it was HUGE.  I couldn't get in to examine things closely,
they keep that stuff very locked down.

Thanks,
Shawn

Re: SolrCloud loadbalancing, replication, and failover

Reply via email to