Re: overseer queue clogged

Gary Yngve Fri, 15 Mar 2013 18:49:14 -0700

I restarted the overseer node and another took over, queues are empty now.

the server with core production_things_shard1_2
is having these errors:


shard update error RetryNode:
http://10.104.59.189:8883/solr/production_things_shard11_replica1/:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at:
http://10.104.59.189:8883/solr/production_things_shard11_replica1

  for shard11!!!

I also got some strange errors on the restarted node.  Makes me wonder if
there is a string-matching bug for shard1 vs shard11?

SEVERE: :org.apache.solr.common.SolrException: Error getting leader from zk
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:771)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:683)
  at org.apache.solr.cloud.ZkController.register(ZkController.java:634)
  at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:890)
  at org.apache.solr.core.CoreContainer.registerCore(CoreContainer.java:874)
  at org.apache.solr.core.CoreContainer.register(CoreContainer.java:823)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:633)
  at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
  at java.util.concurrent.FutureTask.run(FutureTask.java:166)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.common.SolrException: There is conflicting
information about the leader
of shard: shard1 our state
says:http://10.104.59.189:8883/solr/collection1/but zookeeper
says:http
://10.217.55.151:8883/solr/collection1/
  at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:756)

INFO: Releasing
directory:/vol/ubuntu/talemetry_match_solr/solr_server/solr/production_things_shar
d11_replica1/data/index
Mar 15, 2013 5:52:34 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error opening new searcher
  at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1423)
  at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1535)

SEVERE: org.apache.solr.common.SolrException: I was asked to wait on state
recovering for 10.76.31.
67:8883_solr but I still do not see the requested state. I see state:
active live:true
  at
org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler
.java:948)




On Fri, Mar 15, 2013 at 5:05 PM, Mark Miller <markrmil...@gmail.com> wrote:

> Strange - we hardened that loop in 4.1 - so I'm not sure what happened
> here.
>
> Can you do a stack dump on the overseer and see if you see an Overseer
> thread running perhaps? Or just post the results?
>
> To recover, you should be able to just restart the Overseer node and have
> someone else take over - they should pick up processing the queue.
>
> Any logs you might be able to share could be useful too.
>
> - Mark
>
> On Mar 15, 2013, at 7:51 PM, Gary Yngve <gary.yn...@gmail.com> wrote:
>
> > Also, looking at overseer_elect, everything looks fine.  node is valid
> and
> > live.
> >
> >
> > On Fri, Mar 15, 2013 at 4:47 PM, Gary Yngve <gary.yn...@gmail.com>
> wrote:
> >
> >> Sorry, should have specified.  4.1
> >>
> >>
> >>
> >>
> >> On Fri, Mar 15, 2013 at 4:33 PM, Mark Miller <markrmil...@gmail.com
> >wrote:
> >>
> >>> What Solr version? 4.0, 4.1 4.2?
> >>>
> >>> - Mark
> >>>
> >>> On Mar 15, 2013, at 7:19 PM, Gary Yngve <gary.yn...@gmail.com> wrote:
> >>>
> >>>> my solr cloud has been running fine for weeks, but about a week ago,
> it
> >>>> stopped dequeueing from the overseer queue, and now there are
> thousands
> >>> of
> >>>> tasks on the queue, most which look like
> >>>>
> >>>> {
> >>>> "operation":"state",
> >>>> "numShards":null,
> >>>> "shard":"shard3",
> >>>> "roles":null,
> >>>> "state":"recovering",
> >>>> "core":"production_things_shard3_2",
> >>>> "collection":"production_things",
> >>>> "node_name":"10.31.41.59:8883_solr",
> >>>> "base_url":"http://10.31.41.59:8883/solr"}
> >>>>
> >>>> i'm trying to create a new collection through collection API, and
> >>>> obviously, nothing is happening...
> >>>>
> >>>> any suggestion on how to fix this?  drop the queue in zk?
> >>>>
> >>>> how could did it have gotten in this state in the first place?
> >>>>
> >>>> thanks,
> >>>> gary
> >>>
> >>>
> >>
>
>

Re: overseer queue clogged

Reply via email to