Hi, Maybe https://github.com/sematext/solr-diagnostics can be of use?
Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ On Mon, Jun 29, 2020 at 3:46 PM Erick Erickson <erickerick...@gmail.com> wrote: > Really look at your cache size settings. > > This is to eliminate this scenario: > - your cache sizes are very large > - when you looked and the memory was 9G, you also had a lot of cache > entries > - there was a commit, which threw out the old cache and reduced your cache > size > > This is frankly kind of unlikely, but worth checking. > > The other option is that you haven’t been hitting OOMs at all and that’s a > complete > red herring. Let’s say in actuality, you only need an 8G heap or even > smaller. By > overallocating memory garbage will simply accumulate for a long time and > when it > is eventually collected, _lots_ of memory will be collected. > > Another rather unlikely scenario, but again worth checking. > > Best, > Erick > > > On Jun 29, 2020, at 3:27 PM, Ryan W <rya...@gmail.com> wrote: > > > > On Mon, Jun 29, 2020 at 3:13 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> ps aux | grep solr > >> > > > > [solr@faspbsy0002 database-backups]$ ps aux | grep solr > > solr 72072 1.6 33.4 22847816 10966476 ? Sl 13:35 1:36 java > > -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+ParallelRefProcEnabled > > -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages > > -XX:+AggressiveOpts -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails > > -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps > > -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime > > -Xloggc:/opt/solr/server/logs/solr_gc.log -XX:+UseGCLogFileRotation > > -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M > > -Dsolr.log.dir=/opt/solr/server/logs -Djetty.port=8983 -DSTOP.PORT=7983 > > -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server > > -Dsolr.solr.home=/opt/solr/server/solr -Dsolr.data.home= > > -Dsolr.install.dir=/opt/solr > > -Dsolr.default.confdir=/opt/solr/server/solr/configsets/_default/conf > > -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole > > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 > /opt/solr/server/logs > > -jar start.jar --module=http > > > > > > > >> should show you all the parameters Solr is running with, as would the > >> admin screen. You should see something like: > >> > >> -XX:OnOutOfMemoryError=your_solr_directory/bin/oom_solr.sh > >> > >> And there should be some logs laying around if that was the case > >> similar to: > >> $SOLR_LOGS_DIR/solr_oom_killer-$SOLR_PORT-$NOW.log > >> > > > > This log is not being written, even though in the oom_solr.sh it does > > appear a solr_oom_killer-$SOLR_PORT-$NOW.log should be written to the > logs > > directory, but it isn't. There are some log files in > /opt/solr/server/logs, > > and they are indeed being written to. There are fresh entries in the > logs, > > but no sign of any problem. If I grep for oom in the logs directory, the > > only references I see are benign... just a few entries that list all the > > flags, and oom_solr.sh is among the settings visible in the entry. And > > someone did a search for "Mushroom," so there's another instance of oom > > from that search. > > > > > > As for memory, It Depends (tm). There are configurations > >> you can make choices about that will affect the heap requirements. > >> You can’t really draw comparisons between different projects. Your > >> Drupal + Solr app has how many documents? Indexed how? Searched > >> how? .vs. this one. > >> > >> The usual suspect for configuration settings that are responsible > >> include: > >> > >> - filterCache size too large. Each filterCache entry is bounded by > >> maxDoc/8 bytes. I’ve seen people set this to over 1M… > >> > >> - using non-docValues for fields used for sorting, grouping, function > >> queries > >> or faceting. Solr will uninvert the field on the heap, whereas if you > have > >> specified docValues=true, the memory is out in OS memory space rather > than > >> heap. > >> > >> - People just putting too many docs in a collection in a single JVM in > >> aggregate. > >> All replicas in the same instance are using part of the heap. > >> > >> - Having unnecessary options on your fields, although that’s more MMap > >> space than > >> heap. > >> > >> The problem basically is that all of Solr’s access is essentially > random, > >> so for > >> performance reasons lots of stuff has to be in memory. > >> > >> That said, Solr hasn’t been as careful as it should be about using up > >> memory, > >> that’s ongoing. > >> > >> If you really want to know what’s using up memory, throw a heap analysis > >> tool > >> at it. That’ll give you a clue what’s hogging memory and you can go from > >> there. > >> > >>> On Jun 29, 2020, at 1:48 PM, David Hastings < > >> hastings.recurs...@gmail.com> wrote: > >>> > >>> little nit picky note here, use 31gb, never 32. > >>> > >>> On Mon, Jun 29, 2020 at 1:45 PM Ryan W <rya...@gmail.com> wrote: > >>> > >>>> It figures it would happen again a couple hours after I suggested the > >> issue > >>>> might be resolved. Just now, Solr stopped running. I cleared the > >> cache in > >>>> my app a couple times around the time that it happened, so perhaps > that > >> was > >>>> somehow too taxing for the server. However, I've never allocated so > >> much > >>>> RAM to a website before, so it's odd that I'm getting these failures. > >> My > >>>> colleagues were astonished when I said people on the solr-user list > were > >>>> telling me I might need 32GB just for solr. > >>>> > >>>> I manage another project that uses Drupal + Solr, and we have a total > of > >>>> 8GB of RAM on that server and Solr never, ever stops. I've been > >> managing > >>>> that site for years and never seen a Solr outage. On that project, > >>>> Drupal + Solr is OK with 8GB, but somehow this other project needs 64 > >> GB or > >>>> more? > >>>> > >>>> "The thing that’s unsettling about this is that assuming you were > >> hitting > >>>> OOMs, and were running the OOM-killer script, you _should_ have had > very > >>>> clear evidence that that was the cause." > >>>> > >>>> How do I know if I'm running the OOM-killer script? > >>>> > >>>> Thank you. > >>>> > >>>> On Mon, Jun 29, 2020 at 12:12 PM Erick Erickson < > >> erickerick...@gmail.com> > >>>> wrote: > >>>> > >>>>> The thing that’s unsettling about this is that assuming you were > >> hitting > >>>>> OOMs, > >>>>> and were running the OOM-killer script, you _should_ have had very > >> clear > >>>>> evidence that that was the cause. > >>>>> > >>>>> If you were not running the killer script, the apologies for not > asking > >>>>> about that > >>>>> in the first place. Java’s performance is unpredictable when OOMs > >> happen, > >>>>> which is the point of the killer script: at least Solr stops rather > >> than > >>>> do > >>>>> something inexplicable. > >>>>> > >>>>> Best, > >>>>> Erick > >>>>> > >>>>>> On Jun 29, 2020, at 11:52 AM, David Hastings < > >>>>> hastings.recurs...@gmail.com> wrote: > >>>>>> > >>>>>> sometimes just throwing money/ram/ssd at the problem is just the > best > >>>>>> answer. > >>>>>> > >>>>>> On Mon, Jun 29, 2020 at 11:38 AM Ryan W <rya...@gmail.com> wrote: > >>>>>> > >>>>>>> Thanks everyone. Just to give an update on this issue, I bumped the > >>>> RAM > >>>>>>> available to Solr up to 16GB a couple weeks ago, and haven’t had > any > >>>>>>> problem since. > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Jun 16, 2020 at 1:00 PM David Hastings < > >>>>>>> hastings.recurs...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> me personally, around 290gb. as much as we could shove into them > >>>>>>>> > >>>>>>>> On Tue, Jun 16, 2020 at 12:44 PM Erick Erickson < > >>>>> erickerick...@gmail.com > >>>>>>>> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> How much physical RAM? A rule of thumb is that you should > allocate > >>>> no > >>>>>>>> more > >>>>>>>>> than 25-50 percent of the total physical RAM to Solr. That's > >>>>>>> cumulative, > >>>>>>>>> i.e. the sum of the heap allocations across all your JVMs should > be > >>>>>>> below > >>>>>>>>> that percentage. See Uwe Schindler's mmapdirectiry blog... > >>>>>>>>> > >>>>>>>>> Shot in the dark... > >>>>>>>>> > >>>>>>>>> On Tue, Jun 16, 2020, 11:51 David Hastings < > >>>>>>> hastings.recurs...@gmail.com > >>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> To add to this, i generally have solr start with this: > >>>>>>>>>> -Xms31000m-Xmx31000m > >>>>>>>>>> > >>>>>>>>>> and the only other thing that runs on them are maria db gallera > >>>>>>> cluster > >>>>>>>>>> nodes that are not in use (aside from replication) > >>>>>>>>>> > >>>>>>>>>> the 31gb is not an accident either, you dont want 32gb. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Tue, Jun 16, 2020 at 11:26 AM Shawn Heisey < > >> apa...@elyograg.org > >>>>> > >>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> On 6/11/2020 11:52 AM, Ryan W wrote: > >>>>>>>>>>>>> I will check "dmesg" first, to find out any hardware error > >>>>>>>> message. > >>>>>>>>>>> > >>>>>>>>>>> <snip> > >>>>>>>>>>> > >>>>>>>>>>>> [1521232.781801] Out of memory: Kill process 117529 (httpd) > >>>>>>> score 9 > >>>>>>>>> or > >>>>>>>>>>>> sacrifice child > >>>>>>>>>>>> [1521232.782908] Killed process 117529 (httpd), UID 48, > >>>>>>>>>>> total-vm:675824kB, > >>>>>>>>>>>> anon-rss:181844kB, file-rss:0kB, shmem-rss:0kB > >>>>>>>>>>>> > >>>>>>>>>>>> Is this a relevant "Out of memory" message? Does this suggest > >> an > >>>>>>>> OOM > >>>>>>>>>>>> situation is the culprit? > >>>>>>>>>>> > >>>>>>>>>>> Because this was in the "dmesg" output, it indicates that it is > >>>> the > >>>>>>>>>>> operating system killing programs because the *system* doesn't > >>>> have > >>>>>>>> any > >>>>>>>>>>> memory left. It wasn't Java that did this, and it wasn't Solr > >>>> that > >>>>>>>> was > >>>>>>>>>>> killed. It very well could have been Solr that was killed at > >>>>>>> another > >>>>>>>>>>> time, though. > >>>>>>>>>>> > >>>>>>>>>>> The process that it killed this time is named httpd ... which > is > >>>>>>> most > >>>>>>>>>>> likely the Apache webserver. Because the UID is 48, this is > >>>>>>> probably > >>>>>>>>> an > >>>>>>>>>>> OS derived from Redhat, where the "apache" user has UID and GID > >> 48 > >>>>>>> by > >>>>>>>>>>> default. Apache with its default config can be VERY memory > >> hungry > >>>>>>>> when > >>>>>>>>>>> it gets busy. > >>>>>>>>>>> > >>>>>>>>>>>> -XX:InitialHeapSize=536870912 -XX:MaxHeapSize=536870912 > >>>>>>>>>>> > >>>>>>>>>>> This says that you started Solr with the default 512MB heap. > >>>> Which > >>>>>>>> is > >>>>>>>>>>> VERY VERY small. The default is small so that Solr will start > on > >>>>>>>>>>> virtually any hardware. Almost every user must increase the > heap > >>>>>>>> size. > >>>>>>>>>>> And because the OS is killing processes, it is likely that the > >>>>>>> system > >>>>>>>>>>> does not have enough memory installed for what you have running > >> on > >>>>>>>> it. > >>>>>>>>>>> > >>>>>>>>>>> It is generally not a good idea to share the server hardware > >>>>>>> between > >>>>>>>>>>> Solr and other software, unless the system has a lot of spare > >>>>>>>>> resources, > >>>>>>>>>>> memory in particular. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Shawn > >