On 5/31/2019 10:57 AM, Chuck Reynolds wrote:
Hey guys I’m try to do a backup of my Solr cloud cluster but it is never 
starting.

When I execute the async backup command it returns quickly like I would expect 
with the following response

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">11</int></lst><str 
name="requestid">1234</str>
</response>


But the backup never starts.

My reply is a total shot in the dark. It might turn out that SolrCloud doesn't work the way I am thinking, making what I'm about to say worthless. If that's the case, I hope somebody who has intimate knowledge of the backup code can smack me and let me know I'm giving bad info.

I wonder if you're in a situation where the overseer is stuck on one of the messages in its queue and never gets around to even noticing your backup request.

If I am right, then the way I would fix it would be to clear the overseer queue, which lives in in zookeper at /overseer/queue ... and restart all the Solr nodes. Then try the backup again.

You could look at your overseer queue in ZK and see whether it has items stacked up (other than the backup request) that are not clearing out. That's easy to do within the admin UI -- go to the Cloud tab and click on Tree. Then open the overseer node and the queue node under that. A healthy and responsive system would not have many items in the queue. I can't tell you what's normal ... I don't have a live system running that I can look at. After starting the cloud example, it has no entries under queue -- the node can't even be expanded.

Are there any error messages in the logs of any of your servers before your attempt at a backup? My first thought would be to look for something related to the overseer.

Thanks,
Shawn

Reply via email to