Re: solr blocking and client timeout issue

Erick Erickson Thu, 23 Jul 2015 09:13:16 -0700

Thanks for letting us know how it turned out. Too often I'm never sure
what actually _worked_....


Erick

On Thu, Jul 23, 2015 at 8:56 AM, Jeremy Ashcraft <jashcr...@edgate.com> wrote:
> A quick follow up, after finding and eliminating some code that was
> generating multiple update requests per second, applying the CMS GC tuning
> options, and upgrading to Java 8, we've not experienced a single long term
> GC pause.  The java 8 upgrade got rid of the final couple of pauses during
> the day that we were seeing.
>
> Thanks for all the help and suggestions.
>
> On 7/21/2015 1:10 AM, Daniel Collins wrote:
>>
>> We have a similar situation: production runs Java 7u10 (yes, we know its
>> old!), and has custom GC options (G1 works well for us), and a 40Gb heap.
>> We are a heavy user of NRT (sub-second soft-commits!), so that may be the
>> common factor here.
>>
>> Every time we have tried a later Java 7 or Java 8, the heap blows up in no
>> time at all.  We are still investigating the root cause (we do need to
>> migrate to Java 8), but I'm thinking that very high commit rates seem to
>> be
>> the common link here (and its not a common Solr use case I admit).
>>
>> I don't have any silver bullet answers to offer yet, but my
>> suspicion/conjecture (no real evidence yet, I admit) is that the frequent
>> commits are leaving temporary objects around (which they are entitled to
>> do), and something has changed in the GC in later Java 7/8 which means
>> they
>> are slower to get rid of those, hence the overall heap usage is higher
>> under this use case.
>>
>> @Jeremy, you don't have a lot of head room, but try a higher heap size?
>> Could you go to 6Gb and see if that at least delays the issue?
>>
>> Erick is correct though, if you can reduce the commit rate, I'm sure that
>> would alleviate the issue.
>>
>> On 21 July 2015 at 05:31, Erick Erickson <erickerick...@gmail.com> wrote:
>>
>>> bq: the config is set up per the NRT suggestions in the docs.
>>> autoSoftCommit every 2 seconds and autoCommit every 10 minutes.
>>>
>>> 2 second soft commit is very aggressive, no matter what the NRT
>>> suggestions are. My first question is whether that's really needed.
>>> The soft commits should be as long as you can stand. And don't listen
>>> to  your product manager who says "2 seconds is required", push back
>>> and answer whether that's really necessary. Most people won't notice
>>> the difference.
>>>
>>> bq: ...we are noticing a lot higher number of hard commits than usual.
>>>
>>> Is a client somewhere issuing a hard commit? This is rarely
>>> recommended... And is openSearcher true or false? False is a
>>> relatively cheap operation, true is quite expensive.
>>>
>>> More than you want to know about hard and soft commits:
>>>
>>>
>>>
>>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Best,
>>> Erick
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, Jul 20, 2015 at 12:48 PM, Jeremy Ashcraft <jashcr...@edgate.com>
>>> wrote:
>>>>
>>>> heap is already at 5GB
>>>>
>>>> On 07/20/2015 12:29 PM, Jeremy Ashcraft wrote:
>>>>>
>>>>> no swapping that I'm seeing, although we are noticing a lot higher
>>>
>>> number
>>>>>
>>>>> of hard commits than usual.
>>>>>
>>>>> the config is set up per the NRT suggestions in the docs.
>>>
>>> autoSoftCommit
>>>>>
>>>>> every 2 seconds and autoCommit every 10 minutes.
>>>>>
>>>>> there have been 463 updates in the past 2 hours, all followed by hard
>>>>> commits
>>>>>
>>>>> INFO  - 2015-07-20 12:26:20.979;
>>>>> org.apache.solr.update.DirectUpdateHandler2; start
>>>>>
>>>
>>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
>>>>>
>>>>> INFO  - 2015-07-20 12:26:21.021;
>>>
>>> org.apache.solr.core.SolrDeletionPolicy;
>>>>>
>>>>> SolrDeletionPolicy.onCommit: commits: num=2
>>>>>
>>>>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
>>>
>>> /opt/solr/solr/collection1/data/index
>>>>>
>>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
>>>>> maxCacheMB=48.0
>>>
>>> maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696}
>>>>>
>>>>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
>>>
>>> /opt/solr/solr/collection1/data/index
>>>>>
>>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
>>>>> maxCacheMB=48.0
>>>
>>> maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697}
>>>>>
>>>>> INFO  - 2015-07-20 12:26:21.022;
>>>
>>> org.apache.solr.core.SolrDeletionPolicy;
>>>>>
>>>>> newest commit generation = 665697
>>>>> INFO  - 2015-07-20 12:26:21.026;
>>>>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
>>>>> INFO  - 2015-07-20 12:26:21.026;
>>>>> org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
>>>>> webapp=/solr path=/update params={omitHeader=false&wt=json}
>>>>> {add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752),
>>>>> 5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328),
>>>>> 816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0
>>>
>>> 50
>>>>>
>>>>> could that be causing some of the problems?
>>>>>
>>>>> ________________________________________
>>>>> From: Shawn Heisey <apa...@elyograg.org>
>>>>> Sent: Monday, July 20, 2015 11:44 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: Re: solr blocking and client timeout issue
>>>>>
>>>>> On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:
>>>>>>
>>>>>> I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i
>>>>>> can get production upgraded tonight.
>>>>>>
>>>>>> still getting the big GC pauses this morning, even after applying the
>>>>>> GC tuning options.  Everything was fine throughout the weekend.
>>>>>>
>>>>>> My biggest concern is that this instance had been running with no
>>>>>> issues for almost 2 years, but these GC issues started just last week.
>>>>>
>>>>> It's very possible that you're simply going to need a larger heap than
>>>>> you have needed in the past, either because your index has grown, or
>>>>> because your query patterns have changed and now your queries need more
>>>>> memory.  It could even be both of these.
>>>>>
>>>>> At your current index size, assuming that there's nothing else on this
>>>>> machine, you should have enough memory to raise your heap to 5GB.
>>>>>
>>>>> If there ARE other software pieces on this machine, then the long GC
>>>>> pauses (along with other performance issues) could be explained by too
>>>>> much memory allocation out of the 8GB total memory, resulting in
>>>>> swapping at the OS level.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>>>> --
>>>> *jeremy ashcraft*
>>>> development manager
>>>> EdGate Correlation Services <http://correlation.edgate.com>
>>>> /253.853.7133 x228/
>
>

Re: solr blocking and client timeout issue

Reply via email to