Hmmm, I didn’t really look carefully at the end of your e-mail. There not being 
an /overseer znode _looks_ like one or more of your Solr nodes isn’t connecting 
to the proper ZooKeeper ensemble.

bq. All of the instances are able to talk to zookeeper (they are
>>> displayed as active in the SolrCloud view, so they must be able to
>> connect,
>>> right?).

Well, maybe or maybe not. The particular Solr node that you’re working on can 
see ZK, true. But are all of them looking  at the _same_ ensemble? Are any of 
the Solr nodes somehow  running with embedded ZooKeeper through a typo or 
something? And since that’s in the  ZooKeeper log, is the ensemble properly 
configured?  For troubleshooting _only_, I might go back to a single ZK 
instance just long enough to eliminate that possibility.

bq. o.a.s.s.SolrDispatchFilter Could not consume full client request
>>> Early EOF

This usually indicates either massive requests or a mis-configured jetty such 
that the request size exceeds the max allowed. There are  a few settings that 
can be extended, but this is pretty unusual. Unless you have lots and lots and 
lots of nodes, the request size should be reasonably small.

Hmmm, do you  have any massive files in your config (schema, solrconfig, 
synonym files, etc?)? There is a 1M default limit on the size of files, perhaps 
you’re exceeding that. One test would be to use a minimal configset to see if 
that encounters the same issue.


> On Jun 10, 2019, at 11:51 AM, Софія Строчик <> wrote:
> Hi Erick, thanks for your reply!
> I didn't mention it but we have tried async requests. Then it does not time
> out of course, but instead appears to run indefinitely, with REQUESTSTATUS
> response like this:
> {
>  "responseHeader":{
>    "status":0,
>    "QTime":1},
>  "status":{
>    "state":"submitted",
>    "msg":"found [123] in submitted tasks"}}
> These requests then pile up in zookeeper's collection-queue-work without
> ever moving to the completed or failed status.
> While I guess some operations are expensive and can run for a long time, it
> doesn't seem likely that all of these have to take hours (without high load
> on any of the servers!)
> Maybe you have any other suggestions because this one doesn't seem to be
> the case :(
> пн, 10 черв. 2019 о 21:14 Erick Erickson <> пише:
>> Certainly at times  some things  just  take a  long time. The 180
>> second timeout is fairly arbitrary.
>> GC pauses, creating a zillion replicas etc. can cause timeouts like
>> this to be exceeded.
>> Rather than rely on lengthening some magic timeout value and hoping, I
>> suggest you use
>> the async option, see:
>> Then you need to periodically check the status of that job to see the
>> completion status.
>> Do note  this bit in particular:
>> As of now, REQUESTSTATUS does not automatically clean up the tracking
>> data structures...
>> in the  link above.
>> Best,
>> Erick
>> On Mon, Jun 10, 2019 at 11:07 AM Софія Строчик <> wrote:
>>> Hi everyone,
>>> recently when trying to delete a collection we have noticed that all
>> calls
>>> to the Collections API time out after 180s.
>>> Something similar is described here
>>> <
>>> however
>>> restarting the instance or the server does not help.
>>> *This is what the response to the API call looks like:*
>>> {
>>>  "responseHeader":{
>>>    "status":500,
>>>    "QTime":180163},
>>>  "error":{
>>>    "metadata":[
>>>      "error-class","org.apache.solr.common.SolrException",
>>>      "root-error-class","org.apache.solr.common.SolrException"],
>>>    "msg":"overseerstatus the collection time out:180s",
>>>    "trace":"org.apache.solr.common.SolrException: overseerstatus the
>>> collection time out:180s\n\tat
>> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(\n\tat
>> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(\n\tat
>> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(\n\tat
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(\n\tat
>> org.apache.solr.servlet.HttpSolrCall.handleAdmin(\n\tat
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(\n\tat
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(\n\tat
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(\n\tat
>> org.eclipse.jetty.servlet.ServletHandler.doScope(\n\tat
>> org.eclipse.jetty.server.session.SessionHandler.doScope(\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(\n\tat
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(\n\tat
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(\n\tat
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(\n\tat
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(\n\tat
>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(\n\tat
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(\n\tat
>>> org.eclipse.jetty.server.Server.handle(\n\tat
>>> org.eclipse.jetty.server.HttpChannel.handle(\n\tat
>> org.eclipse.jetty.server.HttpConnection.onFillable(\n\tat
>> .AbstractConnection$ReadCallback.succeeded(\n\tat
>> .ChannelEndPoint$\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(\n\tat
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(\n\tat
>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(\n\tat
>> org.eclipse.jetty.util.thread.QueuedThreadPool$\n\tat
>>>    "code":500}}
>>> *The errors look like this in the logs:*
>>> 2019-06-10 15:37:19.446 ERROR (qtp315932542-5748) [   ]
>>> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: reload
>> the
>>> collection time out:180s
>>> at
>> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(
>>> at
>> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(
>>> at
>> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
>>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>>> at
>> org.apache.solr.servlet.HttpSolrCall.handleAdmin(
>>> at
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(
>>> at
>>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
>>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>>> at
>>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(
>>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(
>>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(
>>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(
>>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(
>>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
>>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>>> at
>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
>>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>>> at org.eclipse.jetty.server.Server.handle(
>>> at org.eclipse.jetty.server.HttpChannel.handle(
>>> at
>> org.eclipse.jetty.server.HttpConnection.onFillable(
>>> at
>> .AbstractConnection$ReadCallback.succeeded(
>>> at
>>> at$
>>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(
>>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(
>>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(
>>> at
>>> at
>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$
>>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
>>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$
>>> at
>>> 2019-06-10 15:37:19.446 INFO  (qtp315932542-5748) [   ]
>>> o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/collections
>>> params={name=collection1&action=RELOAD} status=500 QTime=180132
>>> 2019-06-10 15:37:19.446 ERROR (qtp315932542-5748) [   ]
>>> o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: reload
>> the
>>> collection time out:180s
>>> at
>> org.apache.solr.handler.admin.CollectionsHandler.sendToOCPQueue(
>>> at
>> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(
>>> at
>> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(
>>> at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(
>>> at
>> org.apache.solr.servlet.HttpSolrCall.handleAdmin(
>>> at
>> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(
>>> at
>>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>>> at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>>> at
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
>>> at
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>>> at
>>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(
>>> at
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(
>>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(
>>> at
>> org.eclipse.jetty.servlet.ServletHandler.doScope(
>>> at
>> org.eclipse.jetty.server.session.SessionHandler.doScope(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(
>>> at
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(
>>> at
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(
>>> at
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
>>> at
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(
>>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>>> at
>> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
>>> at
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(
>>> at org.eclipse.jetty.server.Server.handle(
>>> at org.eclipse.jetty.server.HttpChannel.handle(
>>> at
>> org.eclipse.jetty.server.HttpConnection.onFillable(
>>> at
>> .AbstractConnection$ReadCallback.succeeded(
>>> at
>>> at$
>>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(
>>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(
>>> at
>> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(
>>> at
>>> at
>> org.eclipse.jetty.util.thread.ReservedThreadExecutor$
>>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
>>> at
>> org.eclipse.jetty.util.thread.QueuedThreadPool$
>>> at
>>> The same thing happens regardless of the node which makes the request or
>>> the node which is specified in the request.
>>> All the Solr nodes are up and running (except the one which we wanted to
>>> delete, but it has been removed manually now). The zookeeper is running
>> as
>>> well. All of the instances are able to talk to zookeeper (they are
>>> displayed as active in the SolrCloud view, so they must be able to
>> connect,
>>> right?). We are also able to ping or view status of one node from the
>> other
>>> so this doesn't look like an issue with firewall. Both zookeeper timeouts
>>> and Jetty request/response header size are increased and this doesn't
>> seem
>>> to help.
>>> Some of the commands which time out are:
>>> There are also commands which do not time out, for example:
>>> LIST
>>> I don't see anything useful in the logs right now, apart from these
>>> suspicious messages on the central node:
>>> o.a.s.s.SolrDispatchFilter Could not consume full client request
>>> Early EOF
>>> and these ones in zookeeper.log:
>>> Error:KeeperErrorCode = NodeExists for /overseer
>>> but not sure if they are related to this issue at all.
>>> Can anyone give me any pointers to where can I get more information about
>>> this problem, or what can I try to fix this?
>>> Thanks,
>>> Sofiya

Reply via email to