[ https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HBASE-25972 started by Kadir Ozdemir. --------------------------------------------- > Single and Multi-version HFiles > ------------------------------- > > Key: HBASE-25972 > URL: https://issues.apache.org/jira/browse/HBASE-25972 > Project: HBase > Issue Type: Improvement > Reporter: Kadir Ozdemir > Assignee: Kadir Ozdemir > Priority: Major > > HBase stores tables row by row in its files, HFiles. An HFile is composed of > blocks. The number of rows stored in a block depends on the row sizes. The > number of rows per block gets lower when the rows has more than one version > since HBase stores all row versions sequentially in the same HFile after > compaction. However, applications (e.g., Phoenix) mostly query the most > recent row versions. > Let us assume that the compaction generates two HFiles instead of one. One of > these files stores only the most recent cell versions. Let’s call this > single-version HFile. The other HFile stores all the previous cell versions. > Let’s call this multi-version HFile. The files that are generated by memstore > flushes will be of type multi version. The major and minor compaction > processes will generate single-version files as well as multi-version files. > This means for the queries on the most recent row versions, HBase does not > need to look into multi-version HFiles that are older than the latest > single-version HFiles. > The blocks of single-version HFiles will be denser than the current HFiles in > general and this will improve the query times for most recent row version > queries. -- This message was sent by Atlassian Jira (v8.20.10#820010)