[ https://issues.apache.org/jira/browse/HBASE-20636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell updated HBASE-20636: ----------------------------------- Comment: was deleted (was: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s{color} | {color:red} HBASE-20636 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.7.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HBASE-20636 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12925536/HBASE-20636.master.003.patch | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14464/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. ) > Introduce two bloom filter type : ROWPREFIX and ROWPREFIX_DELIMITED > ------------------------------------------------------------------- > > Key: HBASE-20636 > URL: https://issues.apache.org/jira/browse/HBASE-20636 > Project: HBase > Issue Type: New Feature > Components: HFile, regionserver, scan > Reporter: Guangxu Cheng > Assignee: Guangxu Cheng > Priority: Major > Attachments: HBASE-20636.master.001.patch, > HBASE-20636.master.002.patch, HBASE-20636.master.003.patch, > HBASE-20636.master.004.patch > > > As we all know, HBase uses BloomFilter(ROW and ROWCOL) to filter unnecessary > files to improve read performance. But they only support Get and do not > support Scan. > In our company(Tencent), many users need to scan all rows with the same > prefix, such as Tencent Game. Game user's some operational record will be > written into HBase, each game user will have a lot of records, the rowkey is > constructed as userid+'#'+timestamps. So we can scan all records for a given > user for a specified period. > For this scenario, we designed the prefix Bloom filter. If the startRow and > stopRow of the Scan has a valid common prefix, the scan will be allowed to > use BloomFilter to filter files which will enhance the performance of the > scan. > Now, this feature has been running on our cluster over a year, and scan > performance for this scenario has been improved by more than one times than > before. -- This message was sent by Atlassian JIRA (v7.6.3#76005)