I’m pretty sure these OOMs are caused by uncontrolled thread creation, up to 
4000 threads. That requires an additional 4 Gb (1 Meg per thread). It is like 
Solr doesn’t use thread pools at all.

I set this in jetty.xml, but it still created 4000 threads.

  <Get name="ThreadPool">
    <Set name="minThreads" type="int"><Property name="solr.jetty.threads.min" 
default="200"/></Set>
    <Set name="maxThreads" type="int"><Property name="solr.jetty.threads.max" 
default="200"/></Set>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 23, 2017, at 7:02 PM, Damien Kamerman <dami...@gmail.com> wrote:
> 
> I found the suggesters very memory hungry. I had one particularly large
> index where the suggester should have been filtering a small number of
> docs, but was mmap'ing the entire index. I only ever saw this behavior with
> the suggesters.
> 
> On 22 November 2017 at 03:17, Walter Underwood <wun...@wunderwood.org>
> wrote:
> 
>> All our customizations are in solr.in.sh. We’re using the one we
>> configured for 6.3.0. I’ll check for any differences between that and the
>> 6.5.1 script.
>> 
>> I don’t see any arguments at all in the dashboard. I do see them in a ps
>> listing, right at the end.
>> 
>> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
>> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
>> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc
>> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
>> -XX:+PrintGCApplicationStoppedTime
>> -Xloggc:/solr/logs/solr_gc.log -XX:+UseGCLogFileRotation
>> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
>> -Dcom.sun.management.jmxremote 
>> -Dcom.sun.management.jmxremote.local.only=false
>> -Dcom.sun.management.jmxremote.ssl=false 
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dcom.sun.management.jmxremote.port=18983 
>> -Dcom.sun.management.jmxremote.rmi.port=18983
>> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com
>> -DzkClientTimeout=15000 -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,
>> zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.
>> cheggnet.com:2181/solr-cloud -Dsolr.log.level=WARN
>> -Dsolr.log.dir=/solr/logs -Djetty.port=8983 -DSTOP.PORT=7983
>> -DSTOP.KEY=solrrocks -Dhost=new-solr-c01.test3.cloud.cheggnet.com
>> -Duser.timezone=UTC -Djetty.home=/apps/solr6/server
>> -Dsolr.solr.home=/apps/solr6/server/solr -Dsolr.install.dir=/apps/solr6
>> -Dgraphite.prefix=solr-cloud.new-solr-c01 -Dgraphite.host=influx.test.
>> cheggnet.com -javaagent:/apps/solr6/newrelic/newrelic.jar
>> -Dnewrelic.environment=test3 -Dsolr.log.muteconsole -Xss256k
>> -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh
>> 8983 /solr/logs -jar start.jar --module=http
>> 
>> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0.
>> Our load benchmarks use prod logs. We added suggesters, but those use
>> analyzing infix, so they are search indexes, not in-memory.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Nov 21, 2017, at 5:46 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>>> 
>>> On 11/20/2017 6:17 PM, Walter Underwood wrote:
>>>> When I ran load benchmarks with 6.3.0, an overloaded cluster would get
>> super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start
>> getting OOMs. That is really bad, because it means we need to reboot every
>> node in the cluster.
>>>> Also, the JVM OOM hook isn’t running the process killer (JVM
>> 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an
>> 8G heap.
>>> <snip>
>>>> This is not good behavior in prod. The process goes to the bad place,
>> then we need to wait until someone is paged and kills it manually. Luckily,
>> it usually drops out of the live nodes for each collection and doesn’t take
>> user traffic.
>>> 
>>> There was a bug, fixed long before 6.3.0, where the OOM killer script
>> wasn't working because the arguments enabling it were in the wrong place.
>> It was fixed in 5.5.1 and 6.0.
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-8145
>>> 
>>> If the scripts that you are using to get Solr started originated with a
>> much older version of Solr than you are currently running, maybe you've got
>> the arguments in the wrong order.
>>> 
>>> Do you see the commandline arguments for the OOM killer (only available
>> on *NIX systems, not Windows) on the admin UI dashboard?  If they are
>> properly placed, you will see them on the dashboard, but if they aren't
>> properly placed, then you won't see them.  This is what the argument looks
>> like for one of my Solr installs:
>>> 
>>> -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
>>> 
>>> Something which you probably already know:  If you're hitting OOM, you
>> need a larger heap, or you need to adjust the config so it uses less
>> memory.  There are no other ways to "fix" OOM problems.
>>> 
>>> Thanks,
>>> Shawn
>> 
>> 

Reply via email to