Hey Richard,

I noticed this issue with the exporter in the 7.x branch. If you look through 
the release notes for Solr since then there have been quite a few improvements 
to the exporter particularly around thread safety and concurrency (and the 
number of nodes it can monitor).  The version of the exporter can run 
independently to your Solr version so my advice would be to download the most 
recent Solr version, check and modify the exporter start script for its library 
dependencies, extract these files to a separate location, and run this version 
against your 7.x instance. If you have the capacity to upgrade your Solr 
version this will save you having to maintain the exporter separately. Since 
making this change the exporter has not missed a beat and we monitor around 100 
Solr nodes.

Good luck,

Dwane
________________________________
From: Richard Goodman <richa...@brandwatch.com>
Sent: Tuesday, 5 May 2020 10:22 PM
To: solr-user@lucene.apache.org <solr-user@lucene.apache.org>
Subject: solr core metrics & prometheus exporter - indexreader is closed

Hi there,

I've been playing with the prometheus exporter for solr, and have created
my config and have deployed it, so far, all groups were running fine (node,
jetty, jvm), however, I'm repeatedly getting an issue with the core group;

WARN  - 2020-05-05 12:01:24.812; org.apache.solr.prometheus.scraper.Async;
Error occurred during metrics collection
java.util.concurrent.ExecutionException:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.0.1:8083/solr: Server Error

request:
http://127.0.0.1:8083/solr/admin/metrics?group=core&wt=json&version=2.2
        at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
~[?:1.8.0_141]
        at
org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
~[?:1.8.0_141]
        at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
~[?:1.8.0_141]
        at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
~[?:1.8.0_141]
        at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
~[?:1.8.0_141]
        at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
~[?:1.8.0_141]
        at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
~[?:1.8.0_141]
        at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
~[?:1.8.0_141]
        at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
~[?:1.8.0_141]
        at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
~[?:1.8.0_141]
        at
org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
~[?:1.8.0_141]
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_141]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_141]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by:
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://127.0.0.1:8083/solr: Server Error

request:
http://127.0.0.1:8083/solr/admin/metrics?group=core&wt=json&version=2.2
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
        at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
        at
org.apache.solr.prometheus.scraper.SolrScraper.request(SolrScraper.java:102)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
org.apache.solr.prometheus.scraper.SolrCloudScraper.lambda$metricsForAllHosts$6(SolrCloudScraper.java:121)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
org.apache.solr.prometheus.scraper.SolrScraper.lambda$null$0(SolrScraper.java:81)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
~[?:1.8.0_141]


Because of this, I believe the exporter is then reporting the following
failure:

WARN  - 2020-05-05 12:01:24.825; org.apache.solr.prometheus.scraper.Async;
Error occurred during metrics collection
java.util.concurrent.ExecutionException: java.lang.NoClassDefFoundError:
org/jcodings/Encoding
        at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
~[?:1.8.0_141]
        at
org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
~[?:1.8.0_141]
        at
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
~[?:1.8.0_141]
        at
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
~[?:1.8.0_141]
        at
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
~[?:1.8.0_141]
        at
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
~[?:1.8.0_141]
        at
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
~[?:1.8.0_141]
        at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
~[?:1.8.0_141]
        at
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
~[?:1.8.0_141]
        at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
~[?:1.8.0_141]
        at
org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
~[solr-prometheus-exporter-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:03]
        at
java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
~[?:1.8.0_141]
        at
java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1595)
~[?:1.8.0_141]
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.7.2-SNAPSHOT.jar:7.7.2-SNAPSHOT
e5d04ab6a061a02e47f9e6df62a3cfa69632987b - jenkins - 2019-11-22 16:23:11]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_141]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_141]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by: java.lang.NoClassDefFoundError: org/jcodings/Encoding


And when I hit the api directly, I'm met with the following;

{
  "responseHeader":{
    "status":500,
    "QTime":44},
  "error":{
    "msg":"this IndexReader is closed",
    "trace":"org.apache.lucene.store.AlreadyClosedException: this
IndexReader is closed\n\tat
org.apache.lucene.index.IndexReader.ensureOpen(IndexReader.java:257)\n\tat
org.apache.lucene.index.StandardDirectoryReader.getVersion(StandardDirectoryReader.java:339)\n\tat
org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat
org.apache.lucene.index.FilterDirectoryReader.getVersion(FilterDirectoryReader.java:127)\n\tat
org.apache.solr.search.SolrIndexSearcher.lambda$initializeMetrics$13(SolrIndexSearcher.java:2268)\n\tat
org.apache.solr.metrics.SolrMetricManager$GaugeWrapper.getValue(SolrMetricManager.java:683)\n\tat
org.apache.solr.util.stats.MetricUtils.convertGauge(MetricUtils.java:488)\n\tat
org.apache.solr.util.stats.MetricUtils.convertMetric(MetricUtils.java:274)\n\tat
org.apache.solr.util.stats.MetricUtils.lambda$toMaps$4(MetricUtils.java:213)\n\tat
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)\n\tat
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)\n\tat
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)\n\tat
java.util.TreeMap$KeySpliterator.forEachRemaining(TreeMap.java:2746)\n\tat
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)\n\tat
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)\n\tat
java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)\n\tat
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)\n\tat
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)\n\tat
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)\n\tat
org.apache.solr.util.stats.MetricUtils.toMaps(MetricUtils.java:211)\n\tat
org.apache.solr.handler.admin.MetricsHandler.handleRequest(MetricsHandler.java:121)\n\tat
org.apache.solr.handler.admin.MetricsHandler.handleRequestBody(MetricsHandler.java:101)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:736)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:717)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:395)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:341)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
    "code":500}}


Because of these errors, when I actually then go to the endpoint the
prometheus exporter is running for the core metrics, I get 0 metrics back
because of this, I am yet to further investigate the prometheus exporter
to determine if in the scenario some metrics can not be gathered and throw
an error, if no metrics are recorded at all.

Whilst I was working on SOLR-14325
<https://issues.apache.org/jira/browse/SOLR-14325>, Andrzej noted that the
core level metrics are only reported if there is an open SolrIndexSearcher.
I started looking at the code for this, but wanted to know if anyone else
has encountered this issue before, it seems to be very frequent with this
cluster I am testing (96 instances, each instance having around 450GB of
indexes on disk w/ 3 way replication).

I guess also, would it bring up a question of having a better response
rather than a 500 status error if no metrics are available?

Kind regards,

--

Richard Goodman

Reply via email to