Hi,

Maybe https://github.com/sematext/solr-diagnostics can be of use?

Otis
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



On Mon, Jun 29, 2020 at 3:46 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> Really look at your cache size settings.
>
> This is to eliminate this scenario:
> - your cache sizes are very large
> - when you looked and the memory was 9G, you also had a lot of cache
> entries
> - there was a commit, which threw out the old cache and reduced your cache
> size
>
> This is frankly kind of unlikely, but worth checking.
>
> The other option is that you haven’t been hitting OOMs at all and that’s a
> complete
> red herring. Let’s say in actuality, you only need an 8G heap or even
> smaller. By
> overallocating memory garbage will simply accumulate for a long time and
> when it
> is eventually collected, _lots_ of memory will be collected.
>
> Another rather unlikely scenario, but again worth checking.
>
> Best,
> Erick
>
> > On Jun 29, 2020, at 3:27 PM, Ryan W <rya...@gmail.com> wrote:
> >
> > On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson <erickerick...@gmail.com>
> > wrote:
> >
> >> ps aux | grep solr
> >>
> >
> > [solr@faspbsy0002 database-backups]$ ps aux | grep solr
> > solr      72072  1.6 33.4 22847816 10966476 ?   Sl   13:35   1:36 java
> > -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
> > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
> > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails
> > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
> > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime
> > -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation
> > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> > -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983
> > -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server
> > -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home=
> > -Dsolr.install.dir=/opt/solr
> > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf
> > -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /opt/solr/server/logs
> > -jar start.jar --module=http
> >
> >
> >
> >> should show you all the parameters Solr is running with, as would the
> >> admin screen. You should see something like:
> >>
> >> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh
> >>
> >> And there should be some logs laying around if that was the case
> >> similar to:
> >> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log
> >>
> >
> > This log is not being written, even though in the oom_solr.sh it does
> > appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the
> logs
> > directory, but it isn't. There are some log files in
> /opt/solr/server/logs,
> > and they are indeed being written to.  There are fresh entries in the
> logs,
> > but no sign of any problem.  If I grep for oom in the logs directory, the
> > only references I see are benign... just a few entries that list all the
> > flags, and oom_solr.sh is among the settings visible in the entry.  And
> > someone did a search for "Mushroom," so there's another instance of oom
> > from that search.
> >
> >
> > As for memory, It Depends (tm). There are configurations
> >> you can make choices about that will affect the heap requirements.
> >> You can’t really draw comparisons between different projects. Your
> >> Drupal + Solr app has how many documents? Indexed how? Searched
> >> how? .vs. this one.
> >>
> >> The usual suspect for configuration settings that are responsible
> >> include:
> >>
> >> - filterCache size too large. Each filterCache entry is bounded by
> >> maxDoc/8 bytes. I’ve seen people set this to over 1M…
> >>
> >> - using non-docValues for fields used for sorting, grouping, function
> >> queries
> >> or faceting. Solr will uninvert the field on the heap, whereas if you
> have
> >> specified docValues=true, the memory is out in OS memory space rather
> than
> >> heap.
> >>
> >> - People just putting too many docs in a collection in a single JVM in
> >> aggregate.
> >> All replicas in the same instance are using part of the heap.
> >>
> >> - Having unnecessary options on your fields, although that’s more MMap
> >> space than
> >> heap.
> >>
> >> The problem basically is that all of Solr’s access is essentially
> random,
> >> so for
> >> performance reasons lots of stuff has to be in memory.
> >>
> >> That said, Solr hasn’t been as careful as it should be about using up
> >> memory,
> >> that’s ongoing.
> >>
> >> If you really want to know what’s using up memory, throw a heap analysis
> >> tool
> >> at it. That’ll give you a clue what’s hogging memory and you can go from
> >> there.
> >>
> >>> On Jun 29, 2020, at 1:48 PM, David Hastings <
> >> hastings.recurs...@gmail.com> wrote:
> >>>
> >>> little nit picky note here, use 31gb, never 32.
> >>>
> >>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W <rya...@gmail.com> wrote:
> >>>
> >>>> It figures it would happen again a couple hours after I suggested the
> >> issue
> >>>> might be resolved.  Just now, Solr stopped running.  I cleared the
> >> cache in
> >>>> my app a couple times around the time that it happened, so perhaps
> that
> >> was
> >>>> somehow too taxing for the server.  However, I've never allocated so
> >> much
> >>>> RAM to a website before, so it's odd that I'm getting these failures.
> >> My
> >>>> colleagues were astonished when I said people on the solr-user list
> were
> >>>> telling me I might need 32GB just for solr.
> >>>>
> >>>> I manage another project that uses Drupal + Solr, and we have a total
> of
> >>>> 8GB of RAM on that server and Solr never, ever stops.  I've been
> >> managing
> >>>> that site for years and never seen a Solr outage.  On that project,
> >>>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64
> >> GB or
> >>>> more?
> >>>>
> >>>> "The thing that’s unsettling about this is that assuming you were
> >> hitting
> >>>> OOMs, and were running the OOM-killer script, you _should_ have had
> very
> >>>> clear evidence that that was the cause."
> >>>>
> >>>> How do I know if I'm running the OOM-killer script?
> >>>>
> >>>> Thank you.
> >>>>
> >>>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson <
> >> erickerick...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> The thing that’s unsettling about this is that assuming you were
> >> hitting
> >>>>> OOMs,
> >>>>> and were running the OOM-killer script, you _should_ have had very
> >> clear
> >>>>> evidence that that was the cause.
> >>>>>
> >>>>> If you were not running the killer script, the apologies for not
> asking
> >>>>> about that
> >>>>> in the first place. Java’s performance is unpredictable when OOMs
> >> happen,
> >>>>> which is the point of the killer script: at least Solr stops rather
> >> than
> >>>> do
> >>>>> something inexplicable.
> >>>>>
> >>>>> Best,
> >>>>> Erick
> >>>>>
> >>>>>> On Jun 29, 2020, at 11:52 AM, David Hastings <
> >>>>> hastings.recurs...@gmail.com> wrote:
> >>>>>>
> >>>>>> sometimes just throwing money/ram/ssd at the problem is just the
> best
> >>>>>> answer.
> >>>>>>
> >>>>>> On Mon, Jun 29, 2020 at 11:38 AM Ryan W <rya...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Thanks everyone. Just to give an update on this issue, I bumped the
> >>>> RAM
> >>>>>>> available to Solr up to 16GB a couple weeks ago, and haven’t had
> any
> >>>>>>> problem since.
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings <
> >>>>>>> hastings.recurs...@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> me personally, around 290gb.  as much as we could shove into them
> >>>>>>>>
> >>>>>>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson <
> >>>>> erickerick...@gmail.com
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> How much physical RAM? A rule of thumb is that you should
> allocate
> >>>> no
> >>>>>>>> more
> >>>>>>>>> than 25-50 percent of the total physical RAM to Solr. That's
> >>>>>>> cumulative,
> >>>>>>>>> i.e. the sum of the heap allocations across all your JVMs should
> be
> >>>>>>> below
> >>>>>>>>> that percentage. See Uwe Schindler's mmapdirectiry blog...
> >>>>>>>>>
> >>>>>>>>> Shot in the dark...
> >>>>>>>>>
> >>>>>>>>> On Tue, Jun 16, 2020, 11:51 David Hastings <
> >>>>>>> hastings.recurs...@gmail.com
> >>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> To add to this, i generally have solr start with this:
> >>>>>>>>>> -Xms31000m-Xmx31000m
> >>>>>>>>>>
> >>>>>>>>>> and the only other thing that runs on them are maria db gallera
> >>>>>>> cluster
> >>>>>>>>>> nodes that are not in use (aside from replication)
> >>>>>>>>>>
> >>>>>>>>>> the 31gb is not an accident either, you dont want 32gb.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey <
> >> apa...@elyograg.org
> >>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On 6/11/2020 11:52 AM, Ryan W wrote:
> >>>>>>>>>>>>> I will check "dmesg" first, to find out any hardware error
> >>>>>>>> message.
> >>>>>>>>>>>
> >>>>>>>>>>> <snip>
> >>>>>>>>>>>
> >>>>>>>>>>>> [1521232.781801] Out of memory: Kill process 117529 (httpd)
> >>>>>>> score 9
> >>>>>>>>> or
> >>>>>>>>>>>> sacrifice child
> >>>>>>>>>>>> [1521232.782908] Killed process 117529 (httpd), UID 48,
> >>>>>>>>>>> total-vm:675824kB,
> >>>>>>>>>>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is this a relevant "Out of memory" message?  Does this suggest
> >> an
> >>>>>>>> OOM
> >>>>>>>>>>>> situation is the culprit?
> >>>>>>>>>>>
> >>>>>>>>>>> Because this was in the "dmesg" output, it indicates that it is
> >>>> the
> >>>>>>>>>>> operating system killing programs because the *system* doesn't
> >>>> have
> >>>>>>>> any
> >>>>>>>>>>> memory left.  It wasn't Java that did this, and it wasn't Solr
> >>>> that
> >>>>>>>> was
> >>>>>>>>>>> killed.  It very well could have been Solr that was killed at
> >>>>>>> another
> >>>>>>>>>>> time, though.
> >>>>>>>>>>>
> >>>>>>>>>>> The process that it killed this time is named httpd ... which
> is
> >>>>>>> most
> >>>>>>>>>>> likely the Apache webserver.  Because the UID is 48, this is
> >>>>>>> probably
> >>>>>>>>> an
> >>>>>>>>>>> OS derived from Redhat, where the "apache" user has UID and GID
> >> 48
> >>>>>>> by
> >>>>>>>>>>> default.  Apache with its default config can be VERY memory
> >> hungry
> >>>>>>>> when
> >>>>>>>>>>> it gets busy.
> >>>>>>>>>>>
> >>>>>>>>>>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912
> >>>>>>>>>>>
> >>>>>>>>>>> This says that you started Solr with the default 512MB heap.
> >>>> Which
> >>>>>>>> is
> >>>>>>>>>>> VERY VERY small.  The default is small so that Solr will start
> on
> >>>>>>>>>>> virtually any hardware.  Almost every user must increase the
> heap
> >>>>>>>> size.
> >>>>>>>>>>> And because the OS is killing processes, it is likely that the
> >>>>>>> system
> >>>>>>>>>>> does not have enough memory installed for what you have running
> >> on
> >>>>>>>> it.
> >>>>>>>>>>>
> >>>>>>>>>>> It is generally not a good idea to share the server hardware
> >>>>>>> between
> >>>>>>>>>>> Solr and other software, unless the system has a lot of spare
> >>>>>>>>> resources,
> >>>>>>>>>>> memory in particular.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Shawn
>
>

Reply via email to