Hey Dwane, Thanks for your email, gah I should have mentioned that I had applied the patches from 8.x branches onto the exporter already *(such as the fixed thread pooling that you mentioned). *I still haven't gotten to the bottom of the IndexReader is closed issue, I found that if that was present on an instance, even calling just http://ip.address:port/solr/admin/metrics would return that and 0 metrics. If I added the following parameter to the call; ®ex=^(?!SEARCHER).* It was all fine. I'm trying to wrap my head around the relationship between a solr core, and an index searcher / reader in the code, but it's quite complicated, similarly, trying to understand how I could replicate this for testing purposes. So if you have any guidance/advice on that area, would be greatly appreciated.
Cheers, On Wed, 6 May 2020 at 21:36, Dwane Hall <dwaneh...@hotmail.com> wrote: > Hey Richard, > > I noticed this issue with the exporter in the 7.x branch. If you look > through the release notes for Solr since then there have been quite a few > improvements to the exporter particularly around thread safety and > concurrency (and the number of nodes it can monitor). The version of the > exporter can run independently to your Solr version so my advice would be > to download the most recent Solr version, check and modify the exporter > start script for its library dependencies, extract these files to a > separate location, and run this version against your 7.x instance. If you > have the capacity to upgrade your Solr version this will save you having to > maintain the exporter separately. Since making this change the exporter has > not missed a beat and we monitor around 100 Solr nodes. > > Good luck, > > Dwane > ------------------------------ > *From:* Richard Goodman <richa...@brandwatch.com> > *Sent:* Tuesday, 5 May 2020 10:22 PM > *To:* solr-user@lucene.apache.org <solr-user@lucene.apache.org> > *Subject:* solr core metrics & prometheus exporter - indexreader is closed > > Hi there, > > I've been playing with the prometheus exporter for solr, and have created > my config and have deployed it, so far, all groups were running fine (node, > jetty, jvm), however, I'm repeatedly getting an issue with the core group; > > WARN - 2020-05-05 12:01:24.812; org.apache.solr.prometheus.scraper.Async; > Error occurred during metrics collection > java.util.concurrent.ExecutionException: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:8083/solr: Server Error > > request: > http://127.0.0.1:8083/solr/admin/metrics?group=core&wt=json&version=2.2 > at > > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > ~[?:1.8.0_141] > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > ~[?:1.8.0_141] > at > org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > ~[?:1.8.0_141] > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > ~[?:1.8.0_141] > at > > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > ~[?:1.8.0_141] > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > ~[?:1.8.0_141] > at > > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > ~[?:1.8.0_141] > at > > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > ~[?:1.8.0_141] > at > > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > ~[?:1.8.0_141] > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > ~[?:1.8.0_141] > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > ~[?:1.8.0_141] > at > > org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > ~[?:1.8.0_141] > at > > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > ~[?:1.8.0_141] > at > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > ~[?:1.8.0_141] > at > > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) > ~[?:1.8.0_141] > at > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_141] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141] > Caused by: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://127.0.0.1:8083/solr: Server Error > > request: > http://127.0.0.1:8083/solr/admin/metrics?group=core&wt=json&version=2.2 > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) > ~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11] > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) > ~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11] > at > > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) > ~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11] > at > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) > ~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11] > at > > org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:102) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > > org.apache.solr.prometheus.scraper.SolrCloudScraper.lambda$metricsForAllHosts$6(SolrCloudScraper.java:121) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > > org.apache.solr.prometheus.scraper.SolrScraper.lambda$null$0(SolrScraper.java:81) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) > ~[?:1.8.0_141] > > > Because of this, I believe the exporter is then reporting the following > failure: > > WARN - 2020-05-05 12:01:24.825; org.apache.solr.prometheus.scraper.Async; > Error occurred during metrics collection > java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError: > org/jcodings/Encoding > at > > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > ~[?:1.8.0_141] > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > ~[?:1.8.0_141] > at > org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) > ~[?:1.8.0_141] > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) > ~[?:1.8.0_141] > at > > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) > ~[?:1.8.0_141] > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > ~[?:1.8.0_141] > at > > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > ~[?:1.8.0_141] > at > > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) > ~[?:1.8.0_141] > at > > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) > ~[?:1.8.0_141] > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > ~[?:1.8.0_141] > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418) > ~[?:1.8.0_141] > at > > org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43) > ~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03] > at > > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870) > ~[?:1.8.0_141] > at > > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852) > ~[?:1.8.0_141] > at > > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > ~[?:1.8.0_141] > at > > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595) > ~[?:1.8.0_141] > at > > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT > e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11] > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_141] > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141] > Caused by: java.lang.NoClassDefFoundError: org/jcodings/Encoding > > > And when I hit the api directly, I'm met with the following; > > { > "responseHeader":{ > "status":500, > "QTime":44}, > "error":{ > "msg":"this IndexReader is closed", > "trace":"org.apache.lucene.store.AlreadyClosedException: this > IndexReader is closed\n\tat > org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:257)\n\tat > > org.apache.lucene.index.StandardDirectoryReader.getVersion(StandardDirectoryReader.java:339)\n\tat > > org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat > > org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat > > org.apache.solr.search.SolrIndexSearcher.lambda$initializeMetrics$13(SolrIndexSearcher.java:2268)\n\tat > > org.apache.solr.metrics.SolrMetricManager$GaugeWrapper.getValue(SolrMetricManager.java:683)\n\tat > > org.apache.solr.util.stats.MetricUtils.convertGauge(MetricUtils.java:488)\n\tat > > org.apache.solr.util.stats.MetricUtils.convertMetric(MetricUtils.java:274)\n\tat > > org.apache.solr.util.stats.MetricUtils.lambda$toMaps$4(MetricUtils.java:213)\n\tat > > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)\n\tat > > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)\n\tat > > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)\n\tat > java.util.TreeMap$KeySpliterator.forEachRemaining(TreeMap.java:2746)\n\tat > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)\n\tat > > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)\n\tat > > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)\n\tat > > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)\n\tat > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)\n\tat > > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)\n\tat > org.apache.solr.util.stats.MetricUtils.toMaps(MetricUtils.java:211)\n\tat > > org.apache.solr.handler.admin.MetricsHandler.handleRequest(MetricsHandler.java:121)\n\tat > > org.apache.solr.handler.admin.MetricsHandler.handleRequestBody(MetricsHandler.java:101)\n\tat > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)\n\tat > > org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat > > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat > > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat > org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat > org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat > > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat > java.lang.Thread.run(Thread.java:748)\n", > "code":500}} > > > Because of these errors, when I actually then go to the endpoint the > prometheus exporter is running for the core metrics, I get 0 metrics back > because of this, I am yet to further investigate the prometheus exporter > to determine if in the scenario some metrics can not be gathered and throw > an error, if no metrics are recorded at all. > > Whilst I was working on SOLR-14325 > <https://issues.apache.org/jira/browse/SOLR-14325>, Andrzej noted that the > core level metrics are only reported if there is an open SolrIndexSearcher. > I started looking at the code for this, but wanted to know if anyone else > has encountered this issue before, it seems to be very frequent with this > cluster I am testing (96 instances, each instance having around 450GB of > indexes on disk w/ 3 way replication). > > I guess also, would it bring up a question of having a better response > rather than a 500 status error if no metrics are available? > > Kind regards, > > -- > > Richard Goodman > -- Richard Goodman | Data Infrastructure engineer richa...@brandwatch.com NEW YORK | BOSTON | BRIGHTON | LONDON | BERLIN | STUTTGART | PARIS | SINGAPORE | SYDNEY <https://www.brandwatch.com/blog/digital-consumer-intelligence/>