Re: overseer queue clogged

Mark Miller Thu, 01 May 2014 08:26:43 -0700

What version are you running? This was fixed in a recent release. It can happen 
if you hit add core with the defaults on the admin page in older versions.


-- 
Mark Miller
about.me/markrmiller

On May 1, 2014 at 11:19:54 AM, ryan.cooke (ryan.co...@gmail.com) wrote:

I saw an overseer queue clogged as well due to a bad message in the queue.  
Unfortunately this went unnoticed for a while until there were 130K messages  
in the overseer queue. Since it was a production system we were not able to  
simply stop everything and delete all Zookeeper data, so we manually deleted  
messages by issuing commands directly through the zkCli.sh tool. After all  
the messages had been cleared, some nodes were in the wrong state (e.g.  
'down' when should have been 'active'). Restarting the 'down' or 'recovery  
failed' nodes brought the whole cluster back to a stable and healthy state.  

Since it can take some digging to determine backlog in the overseer queue,  
some of the symptoms we saw were:  
Overseer throwing an exception like "Path must not end with / character"  
Random nodes throwing an exception like "ClusterState says we are the  
leader, but locally we don't think so"  
Bringing up new replicas time out when attempting to fetch shard id  



--  
View this message in context: 
http://lucene.472066.n3.nabble.com/overseer-queue-clogged-tp4047878p4134129.html
  
Sent from the Solr - User mailing list archive at Nabble.com.

Re: overseer queue clogged

Reply via email to