[ https://issues.apache.org/jira/browse/CASSANDRA-13153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcus Eriksson reassigned CASSANDRA-13153: ------------------------------------------- Assignee: Stefan Podkowinski > Reappeared Data when Mixing Incremental and Full Repairs > -------------------------------------------------------- > > Key: CASSANDRA-13153 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13153 > Project: Cassandra > Issue Type: Bug > Components: Compaction, Tools > Environment: Apache Cassandra 2.2 > Reporter: Amanda Debrot > Assignee: Stefan Podkowinski > Labels: Cassandra > Attachments: log-Reappeared-Data.txt, > Step-by-Step-Simulate-Reappeared-Data.txt > > > This happens for both LeveledCompactionStrategy and > SizeTieredCompactionStrategy. I've only tested it on Cassandra version 2.2 > but it most likely also affects all Cassandra versions after 2.2, if they > have anticompaction with full repair. > When mixing incremental and full repairs, there are a few scenarios where the > Data SSTable is marked as unrepaired and the Tombstone SSTable is marked as > repaired. Then if it is past gc_grace, and the tombstone and data has been > compacted out on other replicas, the next incremental repair will push the > Data to other replicas without the tombstone. > Simplified scenario: > 3 node cluster with RF=3 > Intial config: > Node 1 has data and tombstone in separate SSTables. > Node 2 has data and no tombstone. > Node 3 has data and tombstone in separate SSTables. > Incremental repair (nodetool repair -pr) is run every day so now we have > tombstone on each node. > Some minor compactions have happened since so data and tombstone get merged > to 1 SSTable on Nodes 1 and 3. > Node 1 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Node 2 has data and tombstone in separate SSTables. > Node 3 had a minor compaction that merged data with tombstone. 1 > SSTable with tombstone. > Incremental repairs keep running every day. > Full repairs run weekly (nodetool repair -full -pr). > Now there are 2 scenarios where the Data SSTable will get marked as > "Unrepaired" while Tombstone SSTable will get marked as "Repaired". > Scenario 1: > Since the Data and Tombstone SSTable have been marked as "Repaired" > and anticompacted, they have had minor compactions with other SSTables > containing keys from other ranges. During full repair, if the last node to > run it doesn't own this particular key in it's partitioner range, the Data > and Tombstone SSTable will get anticompacted and marked as "Unrepaired". Now > in the next incremental repair, if the Data SSTable is involved in a minor > compaction during the repair but the Tombstone SSTable is not, the resulting > compacted SSTable will be marked "Unrepaired" and Tombstone SSTable is marked > "Repaired". > Scenario 2: > Only the Data SSTable had minor compaction with other SSTables > containing keys from other ranges after being marked as "Repaired". The > Tombstone SSTable was never involved in a minor compaction so therefore all > keys in that SSTable belong to 1 particular partitioner range. During full > repair, if the last node to run it doesn't own this particular key in it's > partitioner range, the Data SSTable will get anticompacted and marked as > "Unrepaired". The Tombstone SSTable stays marked as Repaired. > Then it’s past gc_grace. Since Node’s #1 and #3 only have 1 SSTable for that > key, the tombstone will get compacted out. > Node 1 has nothing. > Node 2 has data (in unrepaired SSTable) and tombstone (in repaired > SSTable) in separate SSTables. > Node 3 has nothing. > Now when the next incremental repair runs, it will only use the Data SSTable > to build the merkle tree since the tombstone SSTable is flagged as repaired > and data SSTable is marked as unrepaired. And the data will get repaired > against the other two nodes. > Node 1 has data. > Node 2 has data and tombstone in separate SSTables. > Node 3 has data. > If a read request hits Node 1 and 3, it will return data. If it hits 1 and > 2, or 2 and 3, however, it would return no data. > Tested this with single range tokens for simplicity. -- This message was sent by Atlassian JIRA (v6.3.15#6346)