fc='field collapsing'? Dennis Gearon
Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. ----- Original Message ---- From: Peter Karich <peat...@yahoo.de> To: solr-user@lucene.apache.org Sent: Mon, November 15, 2010 1:37:00 PM Subject: Re: Tuning Solr caches with high commit rates (NRT) Hi Jonathan, I am too using fc because it simply was faster. Not sure if this can be applied in general. I will add this info to the wiki. Regards, Peter. > Awesome. I'm not sure his point 1 about facet.method=enum is still valid in >Solr 1.4+. The "fc" facet.method was changed significantly in 1.4, and >generally no longer takes a lot of memory -- for facets with "many" unique >values, method fc in fact should take less than enum, I think? > > Peter Karich wrote: >> Just in case someone is interested: >> >> I put the emails of Peter Sturge with some minor edits in the wiki: >> >> http://wiki.apache.org/solr/NearRealtimeSearchTuning >> >> I found myself search the thread again and again ;-) >> >> Feel free to add and edit content! >> >> Regards, >> Peter. >> >>> Hi Erik, >>> >>> I thought this would be good for the wiki, but I've not submitted to >>> the wiki before, so I thought I'd put this info out there first, then >>> add it if it was deemed useful. >>> If you could let me know the procedure for submitting, it probably >>> would be worth getting it into the wiki (couldn't do it straightaway, >>> as I have a lot of projects on at the moment). If you're able/willing >>> to put it on there for me, that would be very kind of you! >>> >>> Thanks! >>> Peter >>> >>> >>> On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson<erickerick...@gmail.com> >>>wrote: >>>> Peter: >>>> >>>> This kind of information is extremely useful to document, thanks! Do you >>>> have the time/energy to put it up on the Wiki? Anyone can edit it by >>>> creating >>>> a logon. If you don't, would it be OK if someone else did it (with >>>> attribution, >>>> of course)? I guess that by bringing it up I'm volunteering :)... >>>> >>>> Best >>>> Erick >>>> >>>> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge<peter.stu...@gmail.com>wrote: >>>> >>>>> Hi, >>>>> >>>>> Below are some notes regarding Solr cache tuning that should prove >>>>> useful for anyone who uses Solr with frequent commits (e.g.<5min). >>>>> >>>>> Environment: >>>>> Solr 1.4.1 or branch_3x trunk. >>>>> Note the 4.x trunk has lots of neat new features, so the notes here >>>>> are likely less relevant to the 4.x environment. >>>>> >>>>> Overview: >>>>> Our Solr environment makes extensive use of faceting, we perform >>>>> commits every 30secs, and the indexes tend be on the large-ish side >>>>> (>20million docs). >>>>> Note: For our data, when we commit, we are always adding new data, >>>>> never changing existing data. >>>>> This type of environment can be tricky to tune, as Solr is more geared >>>>> toward fast reads than frequent writes. >>>>> >>>>> Symptoms: >>>>> If anyone has used faceting in searches where you are also performing >>>>> frequent commits, you've likely encountered the dreaded OutOfMemory or >>>>> GC Overhead Exeeded errors. >>>>> In high commit rate environments, this is almost always due to >>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't >>>>> finish autowarming their caches before the next commit() >>>>> comes along and invalidates them. >>>>> Once this starts happening on a regular basis, it is likely your >>>>> Solr's JVM will run out of memory eventually, as the number of >>>>> searchers (and their cache arrays) will keep growing until the JVM >>>>> dies of thirst. >>>>> To check if your Solr environment is suffering from this, turn on INFO >>>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping >>>>> onDeckSearchers=x'. >>>>> >>>>> In tests, we've only ever seen this problem when using faceting, and >>>>> facet.method=fc. >>>>> >>>>> Some solutions to this are: >>>>> Reduce the commit rate to allow searchers to fully warm before the >>>>> next commit >>>>> Reduce or eliminate the autowarming in caches >>>>> Both of the above >>>>> >>>>> The trouble is, if you're doing NRT commits, you likely have a good >>>>> reason for it, and reducing/elimintating autowarming will very >>>>> significantly impact search performance in high commit rate >>>>> environments. >>>>> >>>>> Solution: >>>>> Here are some setup steps we've used that allow lots of faceting (we >>>>> typically search with at least 20-35 different facet fields, and date >>>>> faceting/sorting) on large indexes, and still keep decent search >>>>> performance: >>>>> >>>>> 1. Firstly, you should consider using the enum method for facet >>>>> searches (facet.method=enum) unless you've got A LOT of memory on your >>>>> machine. In our tests, this method uses a lot less memory and >>>>> autowarms more quickly than fc. (Note, I've not tried the new >>>>> segement-based 'fcs' option, as I can't find support for it in >>>>> branch_3x - looks nice for 4.x though) >>>>> Admittedly, for our data, enum is not quite as fast for searching as >>>>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile >>>>> tradeoff. >>>>> If you do have access to LOTS of memory, AND you can guarantee that >>>>> the index won't grow beyond the memory capacity (i.e. you have some >>>>> sort of deletion policy in place), fc can be a lot faster than enum >>>>> when searching with lots of facets across many terms. >>>>> >>>>> 2. Secondly, we've found that LRUCache is faster at autowarming than >>>>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our >>>>> environment - your mileage may vary. >>>>> >>>>> So, our filterCache section in solrconfig.xml looks like this: >>>>> <filterCache >>>>> class="solr.LRUCache" >>>>> size="3600" >>>>> initialSize="1400" >>>>> autowarmCount="3600"/> >>>>> >>>>> For a 28GB index, running in a quad-core x64 VMWare instance, 30 >>>>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size >>>>> shows usually in the region of ~2400. >>>>> >>>>> 3. It's also a good idea to have some sort of >>>>> firstSearcher/newSearcher event listener queries to allow new data to >>>>> populate the caches. >>>>> Of course, what you put in these is dependent on the facets you need/use. >>>>> We've found a good combination is a firstSearcher with as many facets >>>>> in the search as your environment can handle, then a subset of the >>>>> most common facets for the newSearcher. >>>>> >>>>> 4. We also set: >>>>> <useColdSearcher>true</useColdSearcher> >>>>> just in case. >>>>> >>>>> 5. Another key area for search performance with high commits is to use >>>>> 2 Solr instances - one for the high commit rate indexing, and one for >>>>> searching. >>>>> The read-only searching instance can be a remote replica, or a local >>>>> read-only instance that reads the same core as the indexing instance >>>>> (for the latter, you'll need something that periodically refreshes - >>>>> i.e. runs commit()). >>>>> This way, you can tune the indexing instance for writing performance >>>>> and the searching instance as above for max read performance. >>>>> >>>>> Using the setup above, we get fantastic searching speed for small >>>>> facet sets (well under 1sec), and really good searching for large >>>>> facet sets (a couple of secs depending on index size, number of >>>>> facets, unique terms etc. etc.), >>>>> even when searching against largeish indexes (>20million docs). >>>>> We have yet to see any OOM or GC errors using the techniques above, >>>>> even in low memory conditions. >>>>> >>>>> I hope there are people that find this useful. I know I've spent a lot >>>>> of time looking for stuff like this, so hopefullly, this will save >>>>> someone some time. >>>>> >>>>> >>>>> Peter >>>>> >> >> > -- http://jetwick.com twitter search prototype