Re: Tuning Solr caches with high commit rates (NRT)

Anders Melchiorsen Mon, 11 Oct 2010 03:02:30 -0700

Hi,

why do you need to change the lockType? Does a readonly instance need
locks at all?



thanks,
Anders.



On Tue, 14 Sep 2010 15:00:54 +0200, Peter Karich <peat...@yahoo.de> wrote:
> Peter Sturge,
>
> this was a nice hint, thanks again! If you are here in Germany anytime I
> can invite you to a beer or an apfelschorle ! :-)
> I only needed to change the lockType to none in the solrconfig.xml,
> disable the replication and set the data dir to the master data dir!
>
> Regards,
> Peter Karich.
>
>> Hi Peter,
>>
>> this scenario would be really great for us - I didn't know that this is
>> possible and works, so: thanks!
>> At the moment we are doing similar with replicating to the readonly
>> instance but
>> the replication is somewhat lengthy and resource-intensive at this
>> datavolume ;-)
>>
>> Regards,
>> Peter.
>>
>>
>>> 1. You can run multiple Solr instances in separate JVMs, with both
>>> having their solr.xml configured to use the same index folder.
>>> You need to be careful that one and only one of these instances will
>>> ever update the index at a time. The best way to ensure this is to use
>>> one for writing only,
>>> and the other is read-only and never writes to the index. This
>>> read-only instance is the one to use for tuning for high search
>>> performance. Even though the RO instance doesn't write to the index,
>>> it still needs periodic (albeit empty) commits to kick off
>>> autowarming/cache refresh.
>>>
>>> Depending on your needs, you might not need to have 2 separate
>>> instances. We need it because the 'write' instance is also doing a lot
>>> of metadata pre-write operations in the same jvm as Solr, and so has
>>> its own memory requirements.
>>>
>>> 2. We use sharding all the time, and it works just fine with this
>>> scenario, as the RO instance is simply another shard in the pack.
>>>
>>>
>>> On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich <peat...@yahoo.de>
wrote:
>>>
>>>
>>>> Peter,
>>>>
>>>> thanks a lot for your in-depth explanations!
>>>> Your findings will be definitely helpful for my next performance
>>>> improvement tests :-)
>>>>
>>>> Two questions:
>>>>
>>>> 1. How would I do that:
>>>>
>>>>
>>>>
>>>>> or a local read-only instance that reads the same core as the
indexing
>>>>> instance (for the latter, you'll need something that periodically
>>>>> refreshes - i.e. runs commit()).
>>>>>
>>>>>
>>>> 2. Did you try sharding with your current setup (e.g. one big,
>>>> nearly-static index and a tiny write+read index)?
>>>>
>>>> Regards,
>>>> Peter.
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Below are some notes regarding Solr cache tuning that should prove
>>>>> useful for anyone who uses Solr with frequent commits (e.g. <5min).
>>>>>
>>>>> Environment:
>>>>> Solr 1.4.1 or branch_3x trunk.
>>>>> Note the 4.x trunk has lots of neat new features, so the notes here
>>>>> are likely less relevant to the 4.x environment.
>>>>>
>>>>> Overview:
>>>>> Our Solr environment makes extensive use of faceting, we perform
>>>>> commits every 30secs, and the indexes tend be on the large-ish side
>>>>> (>20million docs).
>>>>> Note: For our data, when we commit, we are always adding new data,
>>>>> never changing existing data.
>>>>> This type of environment can be tricky to tune, as Solr is more
geared
>>>>> toward fast reads than frequent writes.
>>>>>
>>>>> Symptoms:
>>>>> If anyone has used faceting in searches where you are also
performing
>>>>> frequent commits, you've likely encountered the dreaded OutOfMemory
or
>>>>> GC Overhead Exeeded errors.
>>>>> In high commit rate environments, this is almost always due to
>>>>> multiple 'onDeck' searchers and autowarming - i.e. new searchers
don't
>>>>> finish autowarming their caches before the next commit()
>>>>> comes along and invalidates them.
>>>>> Once this starts happening on a regular basis, it is likely your
>>>>> Solr's JVM will run out of memory eventually, as the number of
>>>>> searchers (and their cache arrays) will keep growing until the JVM
>>>>> dies of thirst.
>>>>> To check if your Solr environment is suffering from this, turn on
INFO
>>>>> level logging, and look for: 'PERFORMANCE WARNING: Overlapping
>>>>> onDeckSearchers=x'.
>>>>>
>>>>> In tests, we've only ever seen this problem when using faceting, and
>>>>> facet.method=fc.
>>>>>
>>>>> Some solutions to this are:
>>>>>     Reduce the commit rate to allow searchers to fully warm before
the
>>>>> next commit
>>>>>     Reduce or eliminate the autowarming in caches
>>>>>     Both of the above
>>>>>
>>>>> The trouble is, if you're doing NRT commits, you likely have a good
>>>>> reason for it, and reducing/elimintating autowarming will very
>>>>> significantly impact search performance in high commit rate
>>>>> environments.
>>>>>
>>>>> Solution:
>>>>> Here are some setup steps we've used that allow lots of faceting (we
>>>>> typically search with at least 20-35 different facet fields, and
date
>>>>> faceting/sorting) on large indexes, and still keep decent search
>>>>> performance:
>>>>>
>>>>> 1. Firstly, you should consider using the enum method for facet
>>>>> searches (facet.method=enum) unless you've got A LOT of memory on
your
>>>>> machine. In our tests, this method uses a lot less memory and
>>>>> autowarms more quickly than fc. (Note, I've not tried the new
>>>>> segement-based 'fcs' option, as I can't find support for it in
>>>>> branch_3x - looks nice for 4.x though)
>>>>> Admittedly, for our data, enum is not quite as fast for searching as
>>>>> fc, but short of purchsing a Thaiwanese RAM factory, it's a
worthwhile
>>>>> tradeoff.
>>>>> If you do have access to LOTS of memory, AND you can guarantee that
>>>>> the index won't grow beyond the memory capacity (i.e. you have some
>>>>> sort of deletion policy in place), fc can be a lot faster than enum
>>>>> when searching with lots of facets across many terms.
>>>>>
>>>>> 2. Secondly, we've found that LRUCache is faster at autowarming than
>>>>> FastLRUCache - in our tests, about 20% faster. Maybe this is just
our
>>>>> environment - your mileage may vary.
>>>>>
>>>>> So, our filterCache section in solrconfig.xml looks like this:
>>>>>     <filterCache
>>>>>       class="solr.LRUCache"
>>>>>       size="3600"
>>>>>       initialSize="1400"
>>>>>       autowarmCount="3600"/>
>>>>>
>>>>> For a 28GB index, running in a quad-core x64 VMWare instance, 30
>>>>> warmed facet fields, Solr is running at ~4GB. Stats filterCache size
>>>>> shows usually in the region of ~2400.
>>>>>
>>>>> 3. It's also a good idea to have some sort of
>>>>> firstSearcher/newSearcher event listener queries to allow new data
to
>>>>> populate the caches.
>>>>> Of course, what you put in these is dependent on the facets you
>>>>> need/use.
>>>>> We've found a good combination is a firstSearcher with as many
facets
>>>>> in the search as your environment can handle, then a subset of the
>>>>> most common facets for the newSearcher.
>>>>>
>>>>> 4. We also set:
>>>>>    <useColdSearcher>true</useColdSearcher>
>>>>> just in case.
>>>>>
>>>>> 5. Another key area for search performance with high commits is to
use
>>>>> 2 Solr instances - one for the high commit rate indexing, and one
for
>>>>> searching.
>>>>> The read-only searching instance can be a remote replica, or a local
>>>>> read-only instance that reads the same core as the indexing instance
>>>>> (for the latter, you'll need something that periodically refreshes -
>>>>> i.e. runs commit()).
>>>>> This way, you can tune the indexing instance for writing performance
>>>>> and the searching instance as above for max read performance.
>>>>>
>>>>> Using the setup above, we get fantastic searching speed for small
>>>>> facet sets (well under 1sec), and really good searching for large
>>>>> facet sets (a couple of secs depending on index size, number of
>>>>> facets, unique terms etc. etc.),
>>>>> even when searching against largeish indexes (>20million docs).
>>>>> We have yet to see any OOM or GC errors using the techniques above,
>>>>> even in low memory conditions.
>>>>>
>>>>> I hope there are people that find this useful. I know I've spent a
lot
>>>>> of time looking for stuff like this, so hopefullly, this will save
>>>>> someone some time.
>>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>

Re: Tuning Solr caches with high commit rates (NRT)

Reply via email to