On 1/30/2014 3:20 PM, Joseph Hagerty wrote:
I'm using Solr 3.5 over Tomcat 6. My index has reached 30G.
<snip>
- The box is an m1.large on AWS EC2. 2 virtual CPUs, 4 ECU, 7.5 GiB RAM
One detail that you did not provide was how much of your 7.5GB RAM you
are allocating to the Java heap for Solr, but I actually don't think I
need that information, because for your index size, you simply don't
have enough. If you're sticking with Amazon, you'll want one of the
instances with at least 30GB of RAM, and you might want to consider more
memory than that.
An ideal RAM size for Solr is equal to the size of on-disk data plus the
heap space used by Solr and other programs. This means that if your
java heap for Solr is 4GB and there are no other significant programs
running on the same server, you'd want a minimum of 34GB of RAM for an
ideal setup with your index. 4GB of that would be for Solr itself, the
remainder would be for the operating system to fully cache your index in
the OS disk cache.
Depending on your query patterns and how your schema is arranged, you
*might* be able to get away as little as half of your index size just
for the OS disk cache, but it's better to make it big enough for the
whole index, plus room for growth.
http://wiki.apache.org/solr/SolrPerformanceProblems
Many people are *shocked* when they are told this information, but if
you think about the relative speeds of getting a chunk of data from a
hard disk vs. getting the same information from memory, it's not all
that shocking.
Thanks,
Shawn