[ https://issues.apache.org/jira/browse/HBASE-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
fuqiang cao reassigned HBASE-8980: ---------------------------------- Assignee: fuqiang cao (was: chunhui shen) > Assistant Store ----------- An Index Store of HRegion > ----------------------------------------------------- > > Key: HBASE-8980 > URL: https://issues.apache.org/jira/browse/HBASE-8980 > Project: HBase > Issue Type: New Feature > Components: regionserver > Reporter: chunhui shen > Assignee: fuqiang cao > Attachments: 8980-94.patch > > > *Background* > a.Generally, we would hope several organizations for the same data. e.g. > Secondary Index sortes the data as the non-primary key. > b.Now, when we scanning the data on HBase with condition, like ValueFilter, > its efficiency seems low > c.We could create an Assistant Store to store the data with another > organization for the data of HRegion > *Assistant Store* > a.It's a store of HRegion, like HStore, could be created by user through > adding ColumnFamliy > b.Data in Assistant Store is the copy of data in HRegion, but using another > organization ,The Exception is that its row could be not in the range of > HRegion and its value is the same as the row of original KeyValue > For example, > The region(Range:'row001'~'row999') includes the following KVs in the Store > cf: > row001/cf:q1/val001 > row002/cf:q1/val002 > row003/cf:q1/val003 > we could create an Assistant Store(named as) for the region which includes > the following KVs: > val001/cf:q1/row001 > val002/cf:q1/row002 > val003/cf:q1/row003 > c.We could use local region transaction to ensure the Atomicity and > Consistency > e.Regionserver will put data into Assistant Store automatically, but user > should read the data from Assistant Store himself > *Example of Using Assistant Store* > a.Supposing exist the empty table named t1 with the column family named c1, > it has only one region (region's range is from EMPTY_START_ROW to > EMPTY_END_ROW). > b.Adding an Assistant Store for the table through adding a new column family > named c2. > c.User put following data to table: > r1/c1:q1/v1 > r2/c1:q1/v2 > r3/c1:q1/v1 > r4/c1:q1/v2 > r5/c1:q1/v1 > r6/c1:q1/v2 > d.Then, the region will have the following data: > r1/c1:q1/v1 > r2/c1:q1/v2 > r3/c1:q1/v1 > r4/c1:q1/v2 > r5/c1:q1/v1 > r6/c1:q1/v2 > v1/c2:q1/r1 > v1/c2:q1/r3 > v1/c2:q1/r5 > v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store) > v2/c2:q1/r4 > v2/c2:q1/r6 > e.Splitting the region into daughter_a and daughter_b with the split poit > 'r4', > then the daughter_a has the following data: > r1/c1:q1/v1 > r2/c1:q1/v2 > r3/c1:q1/v1 > v1/c2:q1/r1 > v1/c2:q1/r3 (Data in Assistant Store) > v2/c2:q1/r2 > the daughter_b has the following data: > r4/c1:q1/v2 > r5/c1:q1/v1 > r6/c1:q1/v2 > v1/c2:q1/r5 > v2/c2:q1/r4(Data in Assistant Store) > v2/c2:q1/r6 > f.From the above, we could see that the data in Assistant Store is always > corresponding to the original data in Region, its data is maintained by > regionserver. > g.How to use the data in Assistant Store? > Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = > 'v2', > We must scan the whole table without Assistant Store. > But now we could use Assistant Store to speed up scanning: > Take a scan on Assistant Store from 'v2' to 'v2+', and get the following > result: > v2/c2:q1/r2 > v2/c2:q1/r4 > v2/c2:q1/r6 > Unfortunately, the scan result may not be ordered by row nor value, but be > able to make it ordered by value. > From the code view, I design the scan on Assistant Store as following: > {code} > //Limit the scan range from the row > Scan scan = new Scan(); > scan.setStartRow('r1'); > scan.setStopRow('r7'); > //Do the scan on Assistant Store > Scan assistantScan = new > Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00'); > scan.setAssistantScan(assistantScan);//After setting this, region will run > the scan with the assistant Scan > scanner = htable.getScanner(scan); > for(Result result:scanner){ > //out put > v2/c2:q1/r2 > v2/c2:q1/r4 > v2/c2:q1/r6 > } > {code} > *Implementation Dependency* > a.Split the StoreFile as value.(Now,we just split the file as row) > b.Support multi-row transaction in region (Alreadt implemented) > Providing an initial patch on 0.94 version. > What do you think about such a Store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)