Again, thank you for this incredible information, I feel on much firmer footing now. I'm going to test distributing this across 10 servers, borrowing a Hadoop cluster temporarily, and see how it does with enough memory to have the whole index cached. But I'm thinking that we'll try the SSD route as our index will probably rest in the 1/2 terabyte range eventually, there's still a lot of active development.
I guess the RAM disk would work in our case also, as we only index in batches, and eventually I'd like to do that off of Solr and just update the index (I'm presuming this is doable in solr cloud, but I haven't put it to task yet). If I could purpose Hadoop to index the shards, that would be ideal, though I haven't quite figured out how to go about it yet. David -----Original Message----- From: Shawn Heisey [mailto:s...@elyograg.org] Sent: Friday, April 19, 2013 9:42 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud loadbalancing, replication, and failover On 4/19/2013 3:48 AM, David Parks wrote: > The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has > dark grey allocation of 602MB, and light grey of an additional 108MB, > for a JVM total of 710MB allocated. If I understand correctly, Solr > memory utilization is > *not* for caching (unless I configured document caches or some of the > other cache options in Solr, which don't seem to apply in this case, > and I haven't altered from their defaults). Right. Solr does have caches, but they serve specific purposes. The OS is much better at general large-scale caching than Solr is. Solr caches get cleared (and possibly re-warmed) whenever you issue a commit on your index that makes new documents visible. > So assuming this box was dedicated to 1 solr instance/shard. What JVM > heap should I set? Does that matter? 24GB JVM heap? Or keep it lower > and ensure the OS cache has plenty of room to operate? (this is an > Ubuntu 12.10 server instance). The JVM heap to use is highly dependent on the nature of your queries, the number of documents, the number of unique terms, etc. The best thing to do is try it out with a relatively large heap, see how much memory actually gets used inside the JVM. The jvisualvm and jconsole tools will give you nice graphs of JVM memory usage. The jstat program will give you raw numbers on the commandline that you'll need to add to get the full picture. Due to the garbage collection model that Java uses, what you'll see is a sawtooth pattern - memory usage goes up to max heap, then garbage collection reduces it to the actual memory used. Generally speaking, you want to have more heap available than the "low" point of that sawtooth pattern. If that low point is around 3GB when you are hitting your index hard with queries and updates, then you would want to give Solr a heap of 4 to 6 GB. > Would I be wise to just put the index on a RAM disk and guarantee > performance? Assuming I installed sufficient RAM? A RAM disk is a very good way to guarantee performance - but RAM disks are ephemeral. Reboot or have an OS crash and it's gone, you'll have to reindex. Also remember that you actually need at *least* twice the size of your index so that Solr (Lucene) has enough room to do merges, and the worst-case scenario is *three* times the index size. Merging happens during normal indexing, not just when you optimize. If you have enough RAM for three times your index size and it takes less than an hour or two to rebuild the index, then a RAM disk might be a viable way to go. I suspect that this won't work for you. Thanks, Shawn