Thanks for letting us know how it turned out. Too often I'm never sure what actually _worked_....
Erick On Thu, Jul 23, 2015 at 8:56 AM, Jeremy Ashcraft <jashcr...@edgate.com> wrote: > A quick follow up, after finding and eliminating some code that was > generating multiple update requests per second, applying the CMS GC tuning > options, and upgrading to Java 8, we've not experienced a single long term > GC pause. The java 8 upgrade got rid of the final couple of pauses during > the day that we were seeing. > > Thanks for all the help and suggestions. > > On 7/21/2015 1:10 AM, Daniel Collins wrote: >> >> We have a similar situation: production runs Java 7u10 (yes, we know its >> old!), and has custom GC options (G1 works well for us), and a 40Gb heap. >> We are a heavy user of NRT (sub-second soft-commits!), so that may be the >> common factor here. >> >> Every time we have tried a later Java 7 or Java 8, the heap blows up in no >> time at all. We are still investigating the root cause (we do need to >> migrate to Java 8), but I'm thinking that very high commit rates seem to >> be >> the common link here (and its not a common Solr use case I admit). >> >> I don't have any silver bullet answers to offer yet, but my >> suspicion/conjecture (no real evidence yet, I admit) is that the frequent >> commits are leaving temporary objects around (which they are entitled to >> do), and something has changed in the GC in later Java 7/8 which means >> they >> are slower to get rid of those, hence the overall heap usage is higher >> under this use case. >> >> @Jeremy, you don't have a lot of head room, but try a higher heap size? >> Could you go to 6Gb and see if that at least delays the issue? >> >> Erick is correct though, if you can reduce the commit rate, I'm sure that >> would alleviate the issue. >> >> On 21 July 2015 at 05:31, Erick Erickson <erickerick...@gmail.com> wrote: >> >>> bq: the config is set up per the NRT suggestions in the docs. >>> autoSoftCommit every 2 seconds and autoCommit every 10 minutes. >>> >>> 2 second soft commit is very aggressive, no matter what the NRT >>> suggestions are. My first question is whether that's really needed. >>> The soft commits should be as long as you can stand. And don't listen >>> to your product manager who says "2 seconds is required", push back >>> and answer whether that's really necessary. Most people won't notice >>> the difference. >>> >>> bq: ...we are noticing a lot higher number of hard commits than usual. >>> >>> Is a client somewhere issuing a hard commit? This is rarely >>> recommended... And is openSearcher true or false? False is a >>> relatively cheap operation, true is quite expensive. >>> >>> More than you want to know about hard and soft commits: >>> >>> >>> >>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ >>> >>> Best, >>> Erick >>> >>> Best, >>> Erick >>> >>> On Mon, Jul 20, 2015 at 12:48 PM, Jeremy Ashcraft <jashcr...@edgate.com> >>> wrote: >>>> >>>> heap is already at 5GB >>>> >>>> On 07/20/2015 12:29 PM, Jeremy Ashcraft wrote: >>>>> >>>>> no swapping that I'm seeing, although we are noticing a lot higher >>> >>> number >>>>> >>>>> of hard commits than usual. >>>>> >>>>> the config is set up per the NRT suggestions in the docs. >>> >>> autoSoftCommit >>>>> >>>>> every 2 seconds and autoCommit every 10 minutes. >>>>> >>>>> there have been 463 updates in the past 2 hours, all followed by hard >>>>> commits >>>>> >>>>> INFO - 2015-07-20 12:26:20.979; >>>>> org.apache.solr.update.DirectUpdateHandler2; start >>>>> >>> >>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} >>>>> >>>>> INFO - 2015-07-20 12:26:21.021; >>> >>> org.apache.solr.core.SolrDeletionPolicy; >>>>> >>>>> SolrDeletionPolicy.onCommit: commits: num=2 >>>>> >>>>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@ >>> >>> /opt/solr/solr/collection1/data/index >>>>> >>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd; >>>>> maxCacheMB=48.0 >>> >>> maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696} >>>>> >>>>> commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@ >>> >>> /opt/solr/solr/collection1/data/index >>>>> >>>>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd; >>>>> maxCacheMB=48.0 >>> >>> maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697} >>>>> >>>>> INFO - 2015-07-20 12:26:21.022; >>> >>> org.apache.solr.core.SolrDeletionPolicy; >>>>> >>>>> newest commit generation = 665697 >>>>> INFO - 2015-07-20 12:26:21.026; >>>>> org.apache.solr.update.DirectUpdateHandler2; end_commit_flush >>>>> INFO - 2015-07-20 12:26:21.026; >>>>> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] >>>>> webapp=/solr path=/update params={omitHeader=false&wt=json} >>>>> {add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752), >>>>> 5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328), >>>>> 816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0 >>> >>> 50 >>>>> >>>>> could that be causing some of the problems? >>>>> >>>>> ________________________________________ >>>>> From: Shawn Heisey <apa...@elyograg.org> >>>>> Sent: Monday, July 20, 2015 11:44 AM >>>>> To: solr-user@lucene.apache.org >>>>> Subject: Re: solr blocking and client timeout issue >>>>> >>>>> On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote: >>>>>> >>>>>> I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i >>>>>> can get production upgraded tonight. >>>>>> >>>>>> still getting the big GC pauses this morning, even after applying the >>>>>> GC tuning options. Everything was fine throughout the weekend. >>>>>> >>>>>> My biggest concern is that this instance had been running with no >>>>>> issues for almost 2 years, but these GC issues started just last week. >>>>> >>>>> It's very possible that you're simply going to need a larger heap than >>>>> you have needed in the past, either because your index has grown, or >>>>> because your query patterns have changed and now your queries need more >>>>> memory. It could even be both of these. >>>>> >>>>> At your current index size, assuming that there's nothing else on this >>>>> machine, you should have enough memory to raise your heap to 5GB. >>>>> >>>>> If there ARE other software pieces on this machine, then the long GC >>>>> pauses (along with other performance issues) could be explained by too >>>>> much memory allocation out of the 8GB total memory, resulting in >>>>> swapping at the OS level. >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >>>> -- >>>> *jeremy ashcraft* >>>> development manager >>>> EdGate Correlation Services <http://correlation.edgate.com> >>>> /253.853.7133 x228/ > >