A quick follow up, after finding and eliminating some code that was generating multiple update requests per second, applying the CMS GC tuning options, and upgrading to Java 8, we've not experienced a single long term GC pause. The java 8 upgrade got rid of the final couple of pauses during the day that we were seeing.

Thanks for all the help and suggestions.

On 7/21/2015 1:10 AM, Daniel Collins wrote:
We have a similar situation: production runs Java 7u10 (yes, we know its
old!), and has custom GC options (G1 works well for us), and a 40Gb heap.
We are a heavy user of NRT (sub-second soft-commits!), so that may be the
common factor here.

Every time we have tried a later Java 7 or Java 8, the heap blows up in no
time at all.  We are still investigating the root cause (we do need to
migrate to Java 8), but I'm thinking that very high commit rates seem to be
the common link here (and its not a common Solr use case I admit).

I don't have any silver bullet answers to offer yet, but my
suspicion/conjecture (no real evidence yet, I admit) is that the frequent
commits are leaving temporary objects around (which they are entitled to
do), and something has changed in the GC in later Java 7/8 which means they
are slower to get rid of those, hence the overall heap usage is higher
under this use case.

@Jeremy, you don't have a lot of head room, but try a higher heap size?
Could you go to 6Gb and see if that at least delays the issue?

Erick is correct though, if you can reduce the commit rate, I'm sure that
would alleviate the issue.

On 21 July 2015 at 05:31, Erick Erickson <erickerick...@gmail.com> wrote:

bq: the config is set up per the NRT suggestions in the docs.
autoSoftCommit every 2 seconds and autoCommit every 10 minutes.

2 second soft commit is very aggressive, no matter what the NRT
suggestions are. My first question is whether that's really needed.
The soft commits should be as long as you can stand. And don't listen
to  your product manager who says "2 seconds is required", push back
and answer whether that's really necessary. Most people won't notice
the difference.

bq: ...we are noticing a lot higher number of hard commits than usual.

Is a client somewhere issuing a hard commit? This is rarely
recommended... And is openSearcher true or false? False is a
relatively cheap operation, true is quite expensive.

More than you want to know about hard and soft commits:


https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

Best,
Erick

On Mon, Jul 20, 2015 at 12:48 PM, Jeremy Ashcraft <jashcr...@edgate.com>
wrote:
heap is already at 5GB

On 07/20/2015 12:29 PM, Jeremy Ashcraft wrote:
no swapping that I'm seeing, although we are noticing a lot higher
number
of hard commits than usual.

the config is set up per the NRT suggestions in the docs.
autoSoftCommit
every 2 seconds and autoCommit every 10 minutes.

there have been 463 updates in the past 2 hours, all followed by hard
commits

INFO  - 2015-07-20 12:26:20.979;
org.apache.solr.update.DirectUpdateHandler2; start

commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO  - 2015-07-20 12:26:21.021;
org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onCommit: commits: num=2

commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
/opt/solr/solr/collection1/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
maxCacheMB=48.0
maxMergeSizeMB=4.0),segFN=segments_e9nk,generation=665696}
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@
/opt/solr/solr/collection1/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@524b89bd;
maxCacheMB=48.0
maxMergeSizeMB=4.0),segFN=segments_e9nl,generation=665697}
INFO  - 2015-07-20 12:26:21.022;
org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 665697
INFO  - 2015-07-20 12:26:21.026;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush
INFO  - 2015-07-20 12:26:21.026;
org.apache.solr.update.processor.LogUpdateProcessor; [collection1]
webapp=/solr path=/update params={omitHeader=false&wt=json}
{add=[8653ea29-a327-4a54-9b00-8468241f2d7c (1507244513403338752),
5cf034a9-d93a-4307-a367-02cb21fa8e35 (1507244513404387328),
816e3a04-9d0e-4587-a3ee-9f9e7b0c7d74 (1507244513405435904)],commit=} 0
50
could that be causing some of the problems?

________________________________________
From: Shawn Heisey <apa...@elyograg.org>
Sent: Monday, July 20, 2015 11:44 AM
To: solr-user@lucene.apache.org
Subject: Re: solr blocking and client timeout issue

On 7/20/2015 11:54 AM, Jeremy Ashcraft wrote:
I'm ugrading to the 1.8 JDK on our dev VM now and testing. Hopefully i
can get production upgraded tonight.

still getting the big GC pauses this morning, even after applying the
GC tuning options.  Everything was fine throughout the weekend.

My biggest concern is that this instance had been running with no
issues for almost 2 years, but these GC issues started just last week.
It's very possible that you're simply going to need a larger heap than
you have needed in the past, either because your index has grown, or
because your query patterns have changed and now your queries need more
memory.  It could even be both of these.

At your current index size, assuming that there's nothing else on this
machine, you should have enough memory to raise your heap to 5GB.

If there ARE other software pieces on this machine, then the long GC
pauses (along with other performance issues) could be explained by too
much memory allocation out of the 8GB total memory, resulting in
swapping at the OS level.

Thanks,
Shawn

--
*jeremy ashcraft*
development manager
EdGate Correlation Services <http://correlation.edgate.com>
/253.853.7133 x228/

Reply via email to