I didn’t develop this feature but know some of how it was designed and 
developed, and believe that it wasn’t intentional to omit support for backups 
post-shard split. I think it might have just been overlooked as a use case.

I’m going to guess that the cause of this is that the shard names changed 
during the split shard procedure. Since the new incremental backup copies 1 
replica from each shard, it needs to track the shard names in addition to the 
collection name. After a shard split all the shard names are changed, so how 
can it know that “shard1" is now “shard1_0" and “shard1_1”? I agree that if 
this is the case the error is not helpful.

If you specify `incremental=false` are you able to get it to succeed? I know 
that defeats the purpose here, but just wondering if it unblocks you. If no 
matter what you’re totally blocked on backups for this collection, that would 
be helpful to know.

I also think you should go ahead and file this in Jira as a bug. And thank you 
for the nicely detailed explanation of the problem.

Cassandra
On Sep 20, 2021, 2:17 PM -0500, Jordan Diehl <[email protected]>, 
wrote:
> Hello,
>
> I was going to open a Solr bug, but I saw the message saying I should discuss 
> this via another channel first. I have been attempting to use the incremental 
> backup API on Solr 8.9.0, but while testing in our product we would 
> occasionally get into a state where all subsequent backup attempts would 
> fail. After some triage we found that it was happening to any collection 
> which had undergone a shard split operation. If we did a backup, completed a 
> shard split operation, then attempted another backup, the second backup would 
> fail with a FileNotFound exception relating to the backup id of the second 
> backup as the error message.
>
>
> Steps to reproduce:
>
> * Create a new collection with no associated backups
> * Run a backup for this collection
>
> * 
> /admin/collections?action=BACKUP&name=myBackupName&collection=myCollectionName&location=/path/to/my/shared/drive
>
> * Run a shard split operation
>
> * /admin/collections?action=SPLITSHARD&collection=name&shard=shardID
>
> * Attempt another backup
>
>
> Expected Outcome:
>
> * If this operation is being blocked intentionally, then I would expect an 
> informative error message explaining why it failed. Otherwise I would expect 
> the backup to complete successfully.
>
>
> Actual Outcome:
>
> * The backup operation fails with a NoSuchFileException.
>
> NOTE: In the below exception message the number in the file which isn’t found 
> (in this case zk_backup_1) relates to the backup attempt which is currently 
> being attempted.
>
> {
>
> "responseHeader":{
>
> "status":500,
>
> "QTime":54},
>
> "failure":{
>
> "MYIPADDRESS:31018_solr":"org.apache.solr.client.solrj.impl.BaseHttpSolrClient$RemoteSolrException:Error
>  from server at null: Error handling 'BACKUPCORE' action"},
>
> "Operation backup caused 
> exception:":"java.nio.file.NoSuchFileException:java.nio.file.NoSuchFileException:
>  /opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
>
> "exception":{
>
> "msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
>
> "rspCode":-1},
>
> "error":{
>
> "metadata":[
>
> "error-class","org.apache.solr.common.SolrException",
>
> "root-error-class","org.apache.solr.common.SolrException"],
>
> "msg":"/opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1",
>
> "trace":"org.apache.solr.common.SolrException: 
> /opt/hci/solrBackups/reproCollectionBackup/reproCollection/zk_backup_1\n\tat 
> org.apache.solr.client.solrj.SolrResponse.getException(SolrResponse.java:65)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.invokeAction(CollectionsHandler.java:301)\n\tat
>  
> org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:257)\n\tat
>  
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216)\n\tat
>  
> org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:836)\n\tat 
> org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:800)\n\tat
>  org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:545)\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427)\n\tat
>  
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357)\n\tat
>  org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201)\n\tat 
> org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat
>  
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n\tat
>  
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)\n\tat
>  
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350)\n\tat
>  
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat
>  
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat
>  
> org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)\n\tat
>  
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat
>  org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat 
> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)\n\tat
>  org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)\n\tat 
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)\n\tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat
>  
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat
>  org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat 
> org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat 
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n\tat
>  
> org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n\tat
>  
> org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)\n\tat
>  
> org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)\n\tat
>  java.lang.Thread.run(Thread.java:748)\n",
>
> "code":500}}
>
>
>
>
> I tried a few different workaround attempts, but after going through these 
> steps I wasn’t able to run another backup for the collection.
>
>
> Workaround attempt 1:
>
> * Use the API to delete the backup
>
> * Used the API to purge unused backup files
>
> * Restarted Solr
>
> * Attempted another backup
>
> * Encountered the same failure
>
>
> Workaround attempt 2:
>
> * Deleted all files in my Solr backup mount location
>
> * Restarted Solr
>
> * Attempted another backup
>
> * Encountered the same failure
>

Reply via email to