Will Berkeley created KUDU-2809: ----------------------------------- Summary: Incremental backup / diff scan does not handle rows that are inserted and deleted between two incrementals correctly Key: KUDU-2809 URL: https://issues.apache.org/jira/browse/KUDU-2809 Project: Kudu Issue Type: Bug Components: backup Affects Versions: 1.9.0 Reporter: Will Berkeley
I did the following sequence of operations: # Insert 100 million rows # Update 1 out of every 11 rows # Make a full backup # Insert 100 million more rows, after the original rows in keyspace # Delete 1 out of every 23 rows # Make an incremental backup Restore failed to apply the incremental backup, failing with an error like {noformat} java.lang.RuntimeException: failed to write 1000 rows from DataFrame to Kudu; sample errors: {noformat} Due to another bug, there's no sample errors, but after hacking around that bug, I found that the incremental contained a row with a DELETE action for a key that is not present in the full backup. That's because the row was inserted in step 4 and deleted in step 5, between backups. We could fix this by # Making diff scan not return a DELETE for such a row # Implementing and using DELETE IGNORE in the restore job -- This message was sent by Atlassian JIRA (v7.6.3#76005)