[ 
https://issues.apache.org/jira/browse/HBASE-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-25972 started by Kadir Ozdemir.
---------------------------------------------
> Single and Multi-version HFiles
> -------------------------------
>
>                 Key: HBASE-25972
>                 URL: https://issues.apache.org/jira/browse/HBASE-25972
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Assignee: Kadir Ozdemir
>            Priority: Major
>
> HBase stores tables row by row in its files, HFiles. An HFile is composed of 
> blocks. The number of rows stored in a block depends on the row sizes. The 
> number of rows per block gets lower when the rows has more than one version 
> since HBase stores all row versions sequentially in the same HFile after 
> compaction. However, applications (e.g., Phoenix) mostly query the most 
> recent row versions.
> Let us assume that the compaction generates two HFiles instead of one. One of 
> these files stores only the most recent cell versions. Let’s call this 
> single-version HFile. The other HFile stores all the previous cell versions. 
> Let’s call this multi-version HFile. The files that are generated by memstore 
> flushes will be of type multi version. The major and minor compaction 
> processes will generate single-version files as well as multi-version files. 
> This means for the queries on the most recent row versions, HBase does not 
> need to look into multi-version HFiles that are older than the latest 
> single-version HFiles.
> The blocks of single-version HFiles will be denser than the current HFiles in 
> general and this will improve the query times for most recent row version 
> queries. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to