Hello,
I was going to open a Solr bug, but I saw the message saying I should discuss
this via another channel first. I have been attempting to use the incremental
backup API on Solr 8.9.0, but while testing in our product we would
occasionally get into a state where all subsequent backup attempts would fail.
After some triage we found that it was happening to any collection which had
undergone a shard split operation. If we did a backup, completed a shard split
operation, then attempted another backup, the second backup would fail with a
FileNotFound exception relating to the backup id of the second backup as the
error message.
Steps to reproduce:
* Create a new collection with no associated backups
* Run a backup for this collection
*
/admin/collections?action=BACKUP&name=myBackupName&collection=myCollectionName&location=/path/to/my/shared/drive
* Run a shard split operation
* /admin/collections?action=SPLITSHARD&collection=name&shard=shardID
* Attempt another backup
Expected Outcome:
* If this operation is being blocked intentionally, then I would expect an
informative error message explaining why it failed. Otherwise I would expect
the backup to complete successfully.
Actual Outcome:
* The backup operation fails with a NoSuchFileException.
NOTE: In the below exception message the number in the file which isn’t found
(in this case zk_backup_1) relates to the backup attempt which is currently
being attempted.
{
"responseHeader":{
"status":500,
"QTime":54},
"failure":{
"MYIPADDRESS:31018_solr":"org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:Error
from server at null: Error handling 'BACKUPCORE' action"},
"Operation backup caused
exception:":"java.nio.file.NoSuchFileException:java.nio.file.NoSuchFileException:
/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
"exception":{
"msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
"rspCode":-1},
"error":{
"metadata":[
"error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.common.SolrException"],
"msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
"trace":"org.apache.solr.common.SolrException:
/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1\n\tat
org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:301)\n\tat
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)\n\tat
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)\n\tat
org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)\n\tat
org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat
org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat
org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)\n\tat
org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
"code":500}}
I tried a few different workaround attempts, but after going through these
steps I wasn’t able to run another backup for the collection.
Workaround attempt 1:
* Use the API to delete the backup
* Used the API to purge unused backup files
* Restarted Solr
* Attempted another backup
* Encountered the same failure
Workaround attempt 2:
* Deleted all files in my Solr backup mount location
* Restarted Solr
* Attempted another backup
* Encountered the same failure