Hi Erik,

I thought this would be good for the wiki, but I've not submitted to
the wiki before, so I thought I'd put this info out there first, then
add it if it was deemed useful.
If you could let me know the procedure for submitting, it probably
would be worth getting it into the wiki (couldn't do it straightaway,
as I have a lot of projects on at the moment). If you're able/willing
to put it on there for me, that would be very kind of you!

Thanks!
Peter


On Sun, Sep 12, 2010 at 5:43 PM, Erick Erickson <erickerick...@gmail.com> wrote:
> Peter:
>
> This kind of information is extremely useful to document, thanks! Do you
> have the time/energy to put it up on the Wiki? Anyone can edit it by
> creating
> a logon. If you don't, would it be OK if someone else did it (with
> attribution,
> of course)? I guess that by bringing it up I'm volunteering :)...
>
> Best
> Erick
>
> On Sun, Sep 12, 2010 at 12:26 PM, Peter Sturge <peter.stu...@gmail.com>wrote:
>
>> Hi,
>>
>> Below are some notes regarding Solr cache tuning that should prove
>> useful for anyone who uses Solr with frequent commits (e.g. <5min).
>>
>> Environment:
>> Solr 1.4.1 or branch_3x trunk.
>> Note the 4.x trunk has lots of neat new features, so the notes here
>> are likely less relevant to the 4.x environment.
>>
>> Overview:
>> Our Solr environment makes extensive use of faceting, we perform
>> commits every 30secs, and the indexes tend be on the large-ish side
>> (>20million docs).
>> Note: For our data, when we commit, we are always adding new data,
>> never changing existing data.
>> This type of environment can be tricky to tune, as Solr is more geared
>> toward fast reads than frequent writes.
>>
>> Symptoms:
>> If anyone has used faceting in searches where you are also performing
>> frequent commits, you've likely encountered the dreaded OutOfMemory or
>> GC Overhead Exeeded errors.
>> In high commit rate environments, this is almost always due to
>> multiple 'onDeck' searchers and autowarming - i.e. new searchers don't
>> finish autowarming their caches before the next commit()
>> comes along and invalidates them.
>> Once this starts happening on a regular basis, it is likely your
>> Solr's JVM will run out of memory eventually, as the number of
>> searchers (and their cache arrays) will keep growing until the JVM
>> dies of thirst.
>> To check if your Solr environment is suffering from this, turn on INFO
>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>> onDeckSearchers=x'.
>>
>> In tests, we've only ever seen this problem when using faceting, and
>> facet.method=fc.
>>
>> Some solutions to this are:
>>    Reduce the commit rate to allow searchers to fully warm before the
>> next commit
>>    Reduce or eliminate the autowarming in caches
>>    Both of the above
>>
>> The trouble is, if you're doing NRT commits, you likely have a good
>> reason for it, and reducing/elimintating autowarming will very
>> significantly impact search performance in high commit rate
>> environments.
>>
>> Solution:
>> Here are some setup steps we've used that allow lots of faceting (we
>> typically search with at least 20-35 different facet fields, and date
>> faceting/sorting) on large indexes, and still keep decent search
>> performance:
>>
>> 1. Firstly, you should consider using the enum method for facet
>> searches (facet.method=enum) unless you've got A LOT of memory on your
>> machine. In our tests, this method uses a lot less memory and
>> autowarms more quickly than fc. (Note, I've not tried the new
>> segement-based 'fcs' option, as I can't find support for it in
>> branch_3x - looks nice for 4.x though)
>> Admittedly, for our data, enum is not quite as fast for searching as
>> fc, but short of purchsing a Thaiwanese RAM factory, it's a worthwhile
>> tradeoff.
>> If you do have access to LOTS of memory, AND you can guarantee that
>> the index won't grow beyond the memory capacity (i.e. you have some
>> sort of deletion policy in place), fc can be a lot faster than enum
>> when searching with lots of facets across many terms.
>>
>> 2. Secondly, we've found that LRUCache is faster at autowarming than
>> FastLRUCache - in our tests, about 20% faster. Maybe this is just our
>> environment - your mileage may vary.
>>
>> So, our filterCache section in solrconfig.xml looks like this:
>>    <filterCache
>>      class="solr.LRUCache"
>>      size="3600"
>>      initialSize="1400"
>>      autowarmCount="3600"/>
>>
>> For a 28GB index, running in a quad-core x64 VMWare instance, 30
>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size
>> shows usually in the region of ~2400.
>>
>> 3. It's also a good idea to have some sort of
>> firstSearcher/newSearcher event listener queries to allow new data to
>> populate the caches.
>> Of course, what you put in these is dependent on the facets you need/use.
>> We've found a good combination is a firstSearcher with as many facets
>> in the search as your environment can handle, then a subset of the
>> most common facets for the newSearcher.
>>
>> 4. We also set:
>>   <useColdSearcher>true</useColdSearcher>
>> just in case.
>>
>> 5. Another key area for search performance with high commits is to use
>> 2 Solr instances - one for the high commit rate indexing, and one for
>> searching.
>> The read-only searching instance can be a remote replica, or a local
>> read-only instance that reads the same core as the indexing instance
>> (for the latter, you'll need something that periodically refreshes -
>> i.e. runs commit()).
>> This way, you can tune the indexing instance for writing performance
>> and the searching instance as above for max read performance.
>>
>> Using the setup above, we get fantastic searching speed for small
>> facet sets (well under 1sec), and really good searching for large
>> facet sets (a couple of secs depending on index size, number of
>> facets, unique terms etc. etc.),
>> even when searching against largeish indexes (>20million docs).
>> We have yet to see any OOM or GC errors using the techniques above,
>> even in low memory conditions.
>>
>> I hope there are people that find this useful. I know I've spent a lot
>> of time looking for stuff like this, so hopefullly, this will save
>> someone some time.
>>
>>
>> Peter
>>
>

Reply via email to