Hi all,
My apologies if this message is a duplicate. I sent it yesterday, but then
realized I wasn’t fully subscribed to the list so I don’t think it went
through. I haven’t seen it appear in the archives.
I’m running a cluster consisting of Solr 8.11.1 (4 nodes) and Zookeeper 3.7.0
(3 nodes). I have a cron job that calls
$SOLR_BASE_URL/admin/collections?action=BACKUP&name=${collection}&collection=${collection}&maxNumBackupPoints=14&location=/data/backup
for each collection every day at 6pm. The backup usually works just fine. But
every 10 days or so, I end up with a busted backup. Each backup directory has
the right number of zk_backup_* directories and backup_*.properties files, but
the index directory is empty – the current backup failed and all previous
backups are wiped out.
I’ve taken to creating a dated tarball of the backup directory every night so
that I can restore the last known good backup in case of this kind of
catastrophic backup failure, but backing up my backup really feels like
something that shouldn’t be necessary. I’ve increased my log preservation time,
so hopefully I can catch the log output from the next time this failure
happens. But in the meantime, is there anything that might explain why solr
would behave like this? Anything I can look for in my configuration, or some
way to try to reproduce the issue? (All I’ve tried so far is calling backup
repeatedly hoping that one of them would fail in this way, but so far, no luck.)
Thanks,
Michael Klein
------
Michael B. Klein (he/him)
Software Development Tech Lead
Repository & Digital Curation
Northwestern University Library
[email protected]<mailto:[email protected]>