Xiaolin Ha created HBASE-25322: ---------------------------------- Summary: Redundant Reference file in bottom region of split Key: HBASE-25322 URL: https://issues.apache.org/jira/browse/HBASE-25322 Project: HBase Issue Type: Improvement Affects Versions: 3.0.0-alpha-1 Reporter: Xiaolin Ha Assignee: Xiaolin Ha
When we split a region ranges from (,), the bottom region should contain keys of(,split key), and the top region should contain keys of [split key, ). Currently, if we do the following operations: # put rowkeys 100,101,102,103,104,105 to a table, and flush the memstore to make a hfile with rowkyes 100,101,102,103,104,105; # put rowkeys 200,201,202,203,204,205 to the table, and flush the memstore to make a hfile with rowkyes 200,201,202,203,204,205; # split the table region, using split key 200; # then the bottom region will has two Reference files, while the top region only has one. But we expect the bottom region has only one Reference file as the the top region. That's because when generating Reference files in child region, the bottom region used the `PrivateCellUtil.createLastOnRow(splitRow)` cell to compare to first keys in the hfiles, while the top region used `PrivateCellUtil.createFirstOnRow(splitRow)` cell to compare to last keys in the hfiles. `LastOnRow(splitRow)` means the maximum row generated by the split row, while `FirstOnRow(splitRow)` means the minimus row generated by the split row. The split row should be in the top region. And we should use `FirstOnRow(splitRow)` compare to hfile first and last keys in both bottom and top region. Though the redundant Reference file will not be read by the bottom region, the compaction of the redundant Reference file will result in empty file if only this redundant Reference file participates in a compaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)