Hi, Below are some notes regarding Solr cache tuning that should prove useful for anyone who uses Solr with frequent commits (e.g. <5min).
Environment: Solr 1.4.1 or branch_3x trunk. Note the 4.x trunk has lots of neat new features, so the notes here are likely less relevant to the 4.x environment. Overview: Our Solr environment makes extensive use of faceting, we perform commits every 30secs, and the indexes tend be on the large-ish side (>20million docs). Note: For our data, when we commit, we are always adding new data, never changing existing data. This type of environment can be tricky to tune, as Solr is more geared toward fast reads than frequent writes. Symptoms: If anyone has used faceting in searches where you are also performing frequent commits, you've likely encountered the dreaded OutOfMemory or GC Overhead Exeeded errors. In high commit rate environments, this is almost always due to multiple 'onDeck' searchers and autowarming - i.e. new searchers don't finish autowarming their caches before the next commit() comes along and invalidates them. Once this starts happening on a regular basis, it is likely your Solr's JVM will run out of memory eventually, as the number of searchers (and their cache arrays) will keep growing until the JVM dies of thirst. To check if your Solr environment is suffering from this, turn on INFO level logging, and look for: 'PERFORMANCE WARNING: Overlapping onDeckSearchers=x'. In tests, we've only ever seen this problem when using faceting, and facet.method=fc. Some solutions to this are: Reduce the commit rate to allow searchers to fully warm before the next commit Reduce or eliminate the autowarming in caches Both of the above The trouble is, if you're doing NRT commits, you likely have a good reason for it, and reducing/elimintating autowarming will very significantly impact search performance in high commit rate environments. Solution: Here are some setup steps we've used that allow lots of faceting (we typically search with at least 20-35 different facet fields, and date faceting/sorting) on large indexes, and still keep decent search performance: 1. Firstly, you should consider using the enum method for facet searches (facet.method=enum) unless you've got A LOT of memory on your machine. In our tests, this method uses a lot less memory and autowarms more quickly than fc. (Note, I've not tried the new segement-based 'fcs' option, as I can't find support for it in branch_3x - looks nice for 4.x though) Admittedly, for our data, enum is not quite as fast for searching as fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile tradeoff. If you do have access to LOTS of memory, AND you can guarantee that the index won't grow beyond the memory capacity (i.e. you have some sort of deletion policy in place), fc can be a lot faster than enum when searching with lots of facets across many terms. 2. Secondly, we've found that LRUCache is faster at autowarming than FastLRUCache - in our tests, about 20% faster. Maybe this is just our environment - your mileage may vary. So, our filterCache section in solrconfig.xml looks like this: <filterCache class="solr.LRUCache" size="3600" initialSize="1400" autowarmCount="3600"/> For a 28GB index, running in a quad-core x64 VMWare instance, 30 warmed facet fields, Solr is running at ~4GB. Stats filterCache size shows usually in the region of ~2400. 3. It's also a good idea to have some sort of firstSearcher/newSearcher event listener queries to allow new data to populate the caches. Of course, what you put in these is dependent on the facets you need/use. We've found a good combination is a firstSearcher with as many facets in the search as your environment can handle, then a subset of the most common facets for the newSearcher. 4. We also set: <useColdSearcher>true</useColdSearcher> just in case. 5. Another key area for search performance with high commits is to use 2 Solr instances - one for the high commit rate indexing, and one for searching. The read-only searching instance can be a remote replica, or a local read-only instance that reads the same core as the indexing instance (for the latter, you'll need something that periodically refreshes - i.e. runs commit()). This way, you can tune the indexing instance for writing performance and the searching instance as above for max read performance. Using the setup above, we get fantastic searching speed for small facet sets (well under 1sec), and really good searching for large facet sets (a couple of secs depending on index size, number of facets, unique terms etc. etc.), even when searching against largeish indexes (>20million docs). We have yet to see any OOM or GC errors using the techniques above, even in low memory conditions. I hope there are people that find this useful. I know I've spent a lot of time looking for stuff like this, so hopefullly, this will save someone some time. Peter