Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

fbrisbart Mon, 09 Dec 2013 08:54:20 -0800

If you want a start time within the next 5 minutes, I think your filter
is not the good one.
* will be replaced by the first date in your field


Try :
fq=start_time:[NOW TO NOW+5MINUTE]

Franck Brisbart


Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit :
> I have a new question about this issue - I create a filter queries of
> the form:
> 
> fq=start_time:[* TO NOW/5MINUTE]
> 
> This is used to restrict the set of documents to only items that have a
> start time within the next 5 minutes. Most of my indexes have millions
> of documents with few documents that start sometime in the future.
> Nearly all of my queries include this, would this cause every other
> search thread to block until the filter query is re-cached every 5
> minutes and if so, is there a better way to do it? Thanks for any
> continued help with this issue!
> 
> > We have a webapp running with a very high HEAP size (24GB) and we have
> > no problems with it AFTER we enabled the new GC that is meant to replace
> > sometime in the future the CMS GC, but you have to have Java 6 update
> > "Some number I couldn't find but latest should cover" to be able to use:
> > 
> > 1. Remove all GC options you have and...
> > 2. Replace them with /"-XX:+UseG1GC -XX:MaxGCPauseMillis=50"/
> > 
> > As a test of course, more information you can read on the following (and
> > interesting) article, we also have Solr running with these options, no
> > more pauses or HEAP size hitting the sky.
> > 
> > Don't get bored reading the 1st (and small) introduction page of the
> > article, page 2 and 3 will make lot of sense:
> > http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061
> > 
> > 
> > HTH,
> > 
> > Guido.
> > 
> > On 26/11/13 21:59, Patrick O'Lone wrote:
> >> We do perform a lot of sorting - on multiple fields in fact. We have
> >> different kinds of Solr configurations - our news searches do little
> >> with regards to faceting, but heavily sort. We provide classified ad
> >> searches and that heavily uses faceting. I might try reducing the JVM
> >> memory some and amount of perm generation as suggested earlier. It feels
> >> like a GC issue and loading the cache just happens to be the victim of a
> >> stop-the-world event at the worse possible time.
> >>
> >>> My gut instinct is that your heap size is way too high. Try
> >>> decreasing it to like 5-10G. I know you say it uses more than that,
> >>> but that just seems bizarre unless you're doing something like
> >>> faceting and/or sorting on every field.
> >>>
> >>> -Michael
> >>>
> >>> -----Original Message-----
> >>> From: Patrick O'Lone [mailto:pol...@townnews.com]
> >>> Sent: Tuesday, November 26, 2013 11:59 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache
> >>>
> >>> I've been tracking a problem in our Solr environment for awhile with
> >>> periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to
> >>> try and thought I might get some insight from some others on this list.
> >>>
> >>> The load on the server is normally anywhere between 1-3. It's an
> >>> 8-core machine with 40GB of RAM. I have about 25GB of index data that
> >>> is replicated to this server every 5 minutes. It's taking about 200
> >>> connections per second and roughly every 5-10 minutes it will stall
> >>> for about 30 seconds to a minute. The stall causes the load to go to
> >>> as high as 90. It is all CPU bound in user space - all cores go to
> >>> 99% utilization (spinlock?). When doing a thread dump, the following
> >>> line is blocked in all running Tomcat threads:
> >>>
> >>> org.apache.lucene.search.FieldCacheImpl$Cache.get (
> >>> FieldCacheImpl.java:230 )
> >>>
> >>> Looking the source code in 3.6.1, that is a function call to
> >>> syncronized() which blocks all threads and causes the backlog. I've
> >>> tried to correlate these events to the replication events - but even
> >>> with replication disabled - this still happens. We run multiple data
> >>> centers using Solr and I was comparing garbage collection processes
> >>> between and noted that the old generation is collected very
> >>> differently on this data center versus others. The old generation is
> >>> collected as a massive collect event (several gigabytes worth) - the
> >>> other data center is more saw toothed and collects only in 500MB-1GB
> >>> at a time. Here's my parameters to java (the same in all environments):
> >>>
> >>> /usr/java/jre/bin/java \
> >>> -verbose:gc \
> >>> -XX:+PrintGCDetails \
> >>> -server \
> >>> -Dcom.sun.management.jmxremote \
> >>> -XX:+UseConcMarkSweepGC \
> >>> -XX:+UseParNewGC \
> >>> -XX:+CMSIncrementalMode \
> >>> -XX:+CMSParallelRemarkEnabled \
> >>> -XX:+CMSIncrementalPacing \
> >>> -XX:NewRatio=3 \
> >>> -Xms30720M \
> >>> -Xmx30720M \
> >>> -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \
> >>> -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
> >>> -Dcatalina.base=/usr/local/share/apache-tomcat \
> >>> -Dcatalina.home=/usr/local/share/apache-tomcat \
> >>> -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start
> >>>
> >>> I've tried a few GC option changes from this (been running this way
> >>> for a couple of years now) - primarily removing CMS Incremental mode
> >>> as we have 8 cores and remarks on the internet suggest that it is
> >>> only for smaller SMP setups. Removing CMS did not fix anything.
> >>>
> >>> I've considered that the heap is way too large (30GB from 40GB) and
> >>> may not leave enough memory for mmap operations (MMap appears to be
> >>> used in the field cache). Based on active memory utilization in Java,
> >>> seems like I might be able to reduce down to 22GB safely - but I'm
> >>> not sure if that will help with the CPU issues.
> >>>
> >>> I think field cache is used for sorting and faceting. I've started to
> >>> investigate facet.method, but from what I can tell, this doesn't seem
> >>> to influence sorting at all - only facet queries. I've tried setting
> >>> useFilterForSortQuery, and seems to require less field cache but
> >>> doesn't address the stalling issues.
> >>>
> >>> Is there something I am overlooking? Perhaps the system is becoming
> >>> oversubscribed in terms of resources? Thanks for any help that is
> >>> offered.
> >>>
> >>> -- 
> >>> Patrick O'Lone
> >>> Director of Software Development
> >>> TownNews.com
> >>>
> >>> E-mail ... pol...@townnews.com
> >>> Phone .... 309-743-0809
> >>> Fax ...... 309-743-0830
> >>>
> >>>
> >>
> > 
> > 
> 
>

Re: Solr 3.6.1 stalling with high CPU and blocking on field cache

Reply via email to