On 5/31/2019 10:57 AM, Chuck Reynolds wrote:
Hey guys I’m try to do a backup of my Solr cloud cluster but it is never
starting.
When I execute the async backup command it returns quickly like I would expect
with the following response
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">11</int></lst><str
name="requestid">1234</str>
</response>
But the backup never starts.
My reply is a total shot in the dark. It might turn out that SolrCloud
doesn't work the way I am thinking, making what I'm about to say
worthless. If that's the case, I hope somebody who has intimate
knowledge of the backup code can smack me and let me know I'm giving bad
info.
I wonder if you're in a situation where the overseer is stuck on one of
the messages in its queue and never gets around to even noticing your
backup request.
If I am right, then the way I would fix it would be to clear the
overseer queue, which lives in in zookeper at /overseer/queue ... and
restart all the Solr nodes. Then try the backup again.
You could look at your overseer queue in ZK and see whether it has items
stacked up (other than the backup request) that are not clearing out.
That's easy to do within the admin UI -- go to the Cloud tab and click
on Tree. Then open the overseer node and the queue node under that. A
healthy and responsive system would not have many items in the queue. I
can't tell you what's normal ... I don't have a live system running that
I can look at. After starting the cloud example, it has no entries
under queue -- the node can't even be expanded.
Are there any error messages in the logs of any of your servers before
your attempt at a backup? My first thought would be to look for
something related to the overseer.
Thanks,
Shawn