It happened again today.  Again, no other apparent problems on the server.
Nothing else is stopping.  Nothing in the logs that strikes me as useful.
I'm using Red Hat Linux 7.8 and Solr 7.7.2.

Solr is stopping a couple times per week and I don't know how to determine
why.

On Sun, Jun 14, 2020 at 9:41 AM Ryan W <rya...@gmail.com> wrote:

> Thank you.  I pasted those settings at the end of my /etc/default/
> solr.in.sh just now and restarted solr.  I will see if that fixes it.
> Previously, I had no settings at all in solr.in.sh except for SOLR_PORT.
>
> On Thu, Jun 11, 2020 at 1:59 PM Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> 1. You have a tiny heap. 536 Megabytes is not enough.
>> 2. I stopped using the CMS GC years ago.
>>
>> Here is the GC config we use on every one of our 150+ Solr hosts. We’re
>> still on Java 8, but will be upgrading soon.
>>
>> SOLR_HEAP=8g
>> # Use G1 GC  -- wunder 2017-01-23
>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> "
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Jun 11, 2020, at 10:52 AM, Ryan W <rya...@gmail.com> wrote:
>> >
>> > On Wed, Jun 10, 2020 at 8:35 PM Hup Chen <chai...@hotmail.com> wrote:
>> >
>> >> I will check "dmesg" first, to find out any hardware error message.
>> >>
>> >
>> > Here is what I see toward the end of the output from dmesg:
>> >
>> > [1521232.781785] [118857]    48 118857   108785      677     201
>> > 901             0 httpd
>> > [1521232.781787] [118860]    48 118860   108785      710     201
>> > 881             0 httpd
>> > [1521232.781788] [118862]    48 118862   113063     5256     210
>> > 725             0 httpd
>> > [1521232.781790] [118864]    48 118864   114085     6634     212
>> > 703             0 httpd
>> > [1521232.781791] [118871]    48 118871   139687    32323     262
>> > 620             0 httpd
>> > [1521232.781793] [118873]    48 118873   108785      821     201
>> > 792             0 httpd
>> > [1521232.781795] [118879]    48 118879   140263    32719     263
>> > 621             0 httpd
>> > [1521232.781796] [118903]    48 118903   108785      812     201
>> > 771             0 httpd
>> > [1521232.781798] [118905]    48 118905   113575     5606     211
>> > 660             0 httpd
>> > [1521232.781800] [118906]    48 118906   113563     5694     211
>> > 626             0 httpd
>> > [1521232.781801] Out of memory: Kill process 117529 (httpd) score 9 or
>> > sacrifice child
>> > [1521232.782908] Killed process 117529 (httpd), UID 48,
>> total-vm:675824kB,
>> > anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
>> >
>> > Is this a relevant "Out of memory" message?  Does this suggest an OOM
>> > situation is the culprit?
>> >
>> > When I grep in the solr logs for oom, I see some entries like this...
>> >
>> > ./solr_gc.log.4.current:CommandLine flags: -XX:CICompilerCount=4
>> > -XX:CMSInitiatingOccupancyFraction=50
>> -XX:CMSMaxAbortablePrecleanTime=6000
>> > -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark
>> > -XX:ConcGCThreads=4 -XX:GCLogFileSize=20971520
>> > -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
>> > -XX:MaxNewSize=134217728 -XX:MaxTenuringThreshold=8
>> > -XX:MinHeapDeltaBytes=196608 -XX:NewRatio=3 -XX:NewSize=134217728
>> > -XX:NumberOfGCLogFiles=9 -XX:OldPLABSize=16 -XX:OldSize=402653184
>> > -XX:-OmitStackTraceInFastThrow
>> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
>> /opt/solr/server/logs
>> > -XX:ParallelGCThreads=4 -XX:+ParallelRefProcEnabled
>> > -XX:PretenureSizeThreshold=67108864 -XX:+PrintGC
>> > -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
>> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
>> > -XX:+PrintTenuringDistribution -XX:SurvivorRatio=4
>> > -XX:TargetSurvivorRatio=90 -XX:ThreadStackSize=256
>> > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedClassPointers
>> > -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation
>> > -XX:+UseParNewGC
>> >
>> > Buried in there I see "OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh".
>> But I
>> > think this is just a setting that indicates what to do in case of an
>> OOM.
>> > And if I look in that oom_solr.sh file, I see it would write an entry
>> to a
>> > solr_oom_kill log. And there is no such log in the logs directory.
>> >
>> > Many thanks.
>> >
>> >
>> >
>> >
>> >> Then use some system admin tools to monitor that server,
>> >> for instance, top, vmstat, lsof, iostat ... or simply install some nice
>> >> free monitoring tool into this system, like monit, monitorix, nagios.
>> >> Good luck!
>> >>
>> >> ________________________________
>> >> From: Ryan W <rya...@gmail.com>
>> >> Sent: Thursday, June 11, 2020 2:13 AM
>> >> To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
>> >> Subject: Re: How to determine why solr stops running?
>> >>
>> >> Hi all,
>> >>
>> >> People keep suggesting I check the logs for errors.  What do those
>> errors
>> >> look like?  Does anyone have examples of the text of a Solr oom
>> error?  Or
>> >> the text of any other errors I should be looking for the next time solr
>> >> fails?  Are there phrases I should grep for in the logs?  Should I be
>> >> looking in the Solr logs for an OOM error, or in the Apache logs?
>> >>
>> >> There is nothing failing on the server except for solr -- at least not
>> that
>> >> I can see.  There is no apparent problem with the hardware or anything
>> else
>> >> on the server.  The OS is Red Hat Enterprise Linux. The server has 16
>> GB of
>> >> RAM and hosts one website that does not get a huge amount of traffic.
>> >>
>> >> When the start command is given to solr, does it first check to see if
>> solr
>> >> is running, or does it always start solr whether it is already running
>> or
>> >> not?
>> >>
>> >> Many thanks!
>> >> Ryan
>> >>
>> >>
>> >> On Tue, Jun 9, 2020 at 7:58 AM Erick Erickson <erickerick...@gmail.com
>> >
>> >> wrote:
>> >>
>> >>> To add to what Dave said, if you have a particular machine that’s
>> prone
>> >> to
>> >>> suddenly stopping, that’s usually a red flag that you should seriously
>> >>> think about hardware issues.
>> >>>
>> >>> If the problem strikes different machines, then I agree with Shawn
>> that
>> >>> the first thing I’d be suspicious of is OOM errors.
>> >>>
>> >>> FWIW,
>> >>> Erick
>> >>>
>> >>>> On Jun 9, 2020, at 6:05 AM, Dave <hastings.recurs...@gmail.com>
>> wrote:
>> >>>>
>> >>>> I’ll add that whenever I’ve had a solr instance shut down, for me
>> it’s
>> >>> been a hardware failure. Either the ram or the disk got a “glitch” and
>> >> both
>> >>> of these are relatively fragile and wear and tear type parts of the
>> >>> machine, and should be expected to fail and be replaced from time to
>> >> time.
>> >>> Solr is pretty aggressive with its logging so there are a lot of
>> writes
>> >>> always happening and of course reads, if the disk has any issues or
>> the
>> >>> memory it can lock it up and bring her down, more so if you have any
>> >>> spellcheck dictionaries or suggesters being built on start up.
>> >>>>
>> >>>> Just my experience with this, could be wrong (most likely wrong) but
>> we
>> >>> always have extra drives and memory around the server room for this
>> >>> reason.  At least once or twice a year we will have a disk failure in
>> the
>> >>> raid and need to swap in a new one.
>> >>>>
>> >>>> Good luck though, also solr should be logging it’s failures so it
>> would
>> >>> be good to look there too
>> >>>>
>> >>>>> On Jun 9, 2020, at 2:35 AM, Shawn Heisey <apa...@elyograg.org>
>> wrote:
>> >>>>>
>> >>>>> On 5/14/2020 7:22 AM, Ryan W wrote:
>> >>>>>> I manage a site where solr has stopped running a couple times in
>> the
>> >>> past
>> >>>>>> week. The server hasn't been rebooted, so that's not the reason.
>> >> What
>> >>> else
>> >>>>>> causes solr to stop running?  How can I investigate why this is
>> >>> happening?
>> >>>>>
>> >>>>> Any situation where Solr stops running and nobody requested the stop
>> >> is
>> >>> a result of a serious problem that must be thoroughly investigated.  I
>> >>> think it's a bad idea for Solr to automatically restart when it stops
>> >>> unexpectedly.  Chances are that whatever caused the crash is going to
>> >>> simply make the crash happen again until the problem is solved.
>> >>> Automatically restarting could hide problems from the system
>> >> administrator.
>> >>>>>
>> >>>>> The only way a Solr auto-restart would be acceptable to me is if it
>> >>> sends a high priority alert to the sysadmin EVERY time it executes an
>> >>> auto-restart.  It really is that bad of a problem.
>> >>>>>
>> >>>>> The causes of Solr crashes (that I can think of) include the
>> >> following.
>> >>> I believe I have listed these four options from most likely to least
>> >> likely:
>> >>>>>
>> >>>>> * Java OutOfMemoryError exceptions.  On non-windows systems, the
>> >>> "bin/solr" script starts Solr with an option that results in Solr's
>> death
>> >>> anytime one of these exceptions occurs.  We do this because program
>> >>> operation is indeterminate and completely unpredictable when OOME
>> occurs,
>> >>> so it's far safer to stop running.  That exception can be caused by
>> >> several
>> >>> things, some of which actually do not involve memory at all.  If
>> you're
>> >>> running on Windows via the bin\solr.cmd command, then this will not
>> >> happen
>> >>> ... but OOME could still cause a crash, because as I already
>> mentioned,
>> >>> program operation is unpredictable when OOME occurs.
>> >>>>>
>> >>>>> * The OS kills Solr because system memory is completely exhausted
>> and
>> >>> Solr is the process using the most memory.  Linux calls this the
>> >>> "oom-killer" ... I am pretty sure something like it exists on most
>> >>> operating systems.
>> >>>>>
>> >>>>> * Corruption somewhere in the system.  Could be in Java, the OS,
>> Solr,
>> >>> or data used by any of those.
>> >>>>>
>> >>>>> * A very serious bug in Solr's code that we haven't discovered yet.
>> >>>>>
>> >>>>> I included that last one simply for completeness.  A bug that
>> causes a
>> >>> crash *COULD* exist, but as of right now, we have not seen any
>> supporting
>> >>> evidence.
>> >>>>>
>> >>>>> My guess is that Java OutOfMemoryError is the cause here, but I
>> can't
>> >>> be certain.  If that is happening, then some resource (which might
>> not be
>> >>> memory) is fully depleted.  We would need to see the full
>> >> OutOfMemoryError
>> >>> exception in order to determine why it is happening. Sometimes the
>> >>> exception is logged in solr.log, sometimes it isn't.  We cannot
>> predict
>> >>> what part of the code will be running when OOME occurs, so it would be
>> >>> nearly impossible for us to guarantee logging.  OOME can happen
>> ANYWHERE
>> >> -
>> >>> even in code that the compiler thinks is immune to exceptions.
>> >>>>>
>> >>>>> Side note to fellow committers:  I wonder if we should implement an
>> >>> uncaught exception handler in Solr.  I have found in my own programs
>> that
>> >>> it helps figure out thorny problems.  And while I am on the subject of
>> >>> handlers that might not be general knowledge, I didn't find a shutdown
>> >> hook
>> >>> or a security manager outside of tests.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Shawn
>> >>>
>> >>>
>> >>
>>
>>

Reply via email to