Dieter De Paepe created HBASE-28461:
---------------------------------------

             Summary: Timing issue in Incremental Backup or 
TestIncrementalBackup
                 Key: HBASE-28461
                 URL: https://issues.apache.org/jira/browse/HBASE-28461
             Project: HBase
          Issue Type: Bug
          Components: backup&restore
    Affects Versions: 2.6.0, 4.0.0-alpha-1
         Environment: HBase master - commit 
298c550c804305f2c57029a563039eefcbb4af40

20.04.1-Ubuntu
            Reporter: Dieter De Paepe


While working on tests for HBASE-28412, I noticed that `TestIncrementalBackup` 
always fails on my computer and the computer of a colleague. However, I manage 
to get this test working 100% when setting breakpoints on following locations 
(accidentally discovered while trying to debug this issue):
 * `IncrementalTableBackupClient` L269 (`newTimestamps = 
((IncrementalBackupManager) backupManager).getIncrBackupLogFileMap();`)
 * `IncrementalBackupManager` L96 (`logList = 
getLogFilesForNewBackup(previousTimestampMins, newTimestamps, conf, 
savedStartCode);`)
 * `WALInputFormat` L356 (`logList = 
getLogFilesForNewBackup(previousTimestampMins, newTimestamps, conf, 
savedStartCode);`)

This test fails with the following error:

 
{code:java}
java.io.IOException: java.io.FileNotFoundException: File 
hdfs://localhost:46787/user/dieter/test-data/b615664f-1cde-fc27-1752-33ab359a931c/WALs/localhost,38309,1711553427191/localhost%2C38309%2C1711553427191.localhost%2C38309%2C1711553427191.regiongroup-0.1711553471361
 does not exist.    at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:289)
    at 
org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:603)
    at 
org.apache.hadoop.hbase.backup.TestIncrementalBackup.TestIncBackupRestore(TestIncrementalBackup.java:169)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    at org.junit.runners.Suite.runChild(Suite.java:128)
    at org.junit.runners.Suite.runChild(Suite.java:27)
    at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
    at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.FileNotFoundException: File 
hdfs://localhost:46787/user/dieter/test-data/b615664f-1cde-fc27-1752-33ab359a931c/WALs/localhost,38309,1711553427191/localhost%2C38309%2C1711553427191.localhost%2C38309%2C1711553427191.regiongroup-0.1711553471361
 does not exist.
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1282)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1256)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1201)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1197)
    at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at 
org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1215)
    at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2230)
    at 
org.apache.hadoop.hbase.mapreduce.WALInputFormat.getFiles(WALInputFormat.java:356)
    at 
org.apache.hadoop.hbase.mapreduce.WALInputFormat.getSplits(WALInputFormat.java:321)
    at 
org.apache.hadoop.hbase.mapreduce.WALInputFormat.getSplits(WALInputFormat.java:301)
    at 
org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:310)
    at 
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:327)
    at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1678)
    at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1675)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1675)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1696)
    at org.apache.hadoop.hbase.mapreduce.WALPlayer.run(WALPlayer.java:423)
    at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.walToHFiles(IncrementalTableBackupClient.java:406)
    at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.convertWALsToHFiles(IncrementalTableBackupClient.java:378)
    at 
org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:282)
    ... 34 more {code}
 

 

I suspect the issue is that the WAL files are not yet fully written to HDFS 
when the WALPlayer tries to convert them to HFiles.

Since I'm not sure about the exact cause yet, I'm also not sure if this is a 
problem with the test (which uses a dedicated 
`IncrementalTableBackupClientForTest`) or the actual backup code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to