[ 
https://issues.apache.org/jira/browse/HBASE-8980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fuqiang cao reassigned HBASE-8980:
----------------------------------

    Assignee: fuqiang cao  (was: chunhui shen)

> Assistant Store ----------- An Index Store of HRegion
> -----------------------------------------------------
>
>                 Key: HBASE-8980
>                 URL: https://issues.apache.org/jira/browse/HBASE-8980
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: chunhui shen
>            Assignee: fuqiang cao
>         Attachments: 8980-94.patch
>
>
> *Background*
> a.Generally, we would hope several organizations for the same data. e.g. 
> Secondary Index sortes the data as the non-primary key.
> b.Now, when we scanning the data on HBase with condition, like ValueFilter, 
> its  efficiency seems low
> c.We could create an Assistant Store to store the data with another 
> organization for the data of HRegion
> *Assistant Store*
> a.It's a store of HRegion, like HStore, could be created by user through 
> adding ColumnFamliy
> b.Data in Assistant Store is the copy of data in HRegion, but using another 
> organization ,The Exception is that its row could be not in the range of 
> HRegion and its value is the same as the row of original KeyValue
> For example, 
> The region(Range:'row001'~'row999') includes the following KVs in the Store 
> cf:
> row001/cf:q1/val001
> row002/cf:q1/val002
> row003/cf:q1/val003
> we could create an Assistant Store(named as) for the region which includes 
> the following KVs:
> val001/cf:q1/row001
> val002/cf:q1/row002
> val003/cf:q1/row003
> c.We could use local region transaction to ensure the Atomicity and 
> Consistency
> e.Regionserver will put data into Assistant Store automatically, but user 
> should read the data from Assistant Store himself
> *Example of Using Assistant Store*
> a.Supposing exist the empty table named t1 with the column family named c1, 
> it has only one region (region's range is from EMPTY_START_ROW to 
> EMPTY_END_ROW).
> b.Adding an Assistant Store for the table through adding a new column family 
> named c2.
> c.User put following data to table:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> d.Then, the region will have the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r1
> v1/c2:q1/r3
> v1/c2:q1/r5
> v2/c2:q1/r2 (Generated by Assistant, Stored in Assistant Store)
> v2/c2:q1/r4
> v2/c2:q1/r6
> e.Splitting the region into daughter_a  and daughter_b with the split poit 
> 'r4', 
> then the daughter_a has the following data:
> r1/c1:q1/v1
> r2/c1:q1/v2
> r3/c1:q1/v1
> v1/c2:q1/r1
> v1/c2:q1/r3  (Data in Assistant Store)
> v2/c2:q1/r2
> the daughter_b has the following data:
> r4/c1:q1/v2
> r5/c1:q1/v1
> r6/c1:q1/v2
> v1/c2:q1/r5
> v2/c2:q1/r4(Data in Assistant Store)
> v2/c2:q1/r6
> f.From the above, we could see that the data in Assistant Store is always 
> corresponding to the original data in Region, its data is maintained by 
> regionserver.
> g.How to use the data in Assistant Store? 
> Suppose we want to do a scan from 'r1' to 'r7' with the ValueFilter value = 
> 'v2',
> We must scan the whole table without Assistant Store.
> But now we could use Assistant Store to speed up scanning:
> Take a scan on Assistant Store from 'v2' to 'v2+', and get the following 
> result:
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> Unfortunately, the scan result may not be ordered by row nor value, but be 
> able to make it ordered by value.
> From the code view, I design the scan on Assistant Store as following:
> {code}
> //Limit the scan range from the row
> Scan scan = new Scan();
> scan.setStartRow('r1');
> scan.setStopRow('r7');
> //Do the scan on Assistant Store
> Scan assistantScan = new 
> Scan().setStartRow('v2').setStopRow('v2'+'(byte)0x00');
> scan.setAssistantScan(assistantScan);//After setting this, region will run 
> the scan with the assistant Scan
> scanner = htable.getScanner(scan);
> for(Result result:scanner){
> //out put
> v2/c2:q1/r2
> v2/c2:q1/r4
> v2/c2:q1/r6
> }
> {code}
> *Implementation Dependency*
> a.Split the StoreFile as value.(Now,we just split the file as row)
> b.Support multi-row transaction in region (Alreadt implemented)
> Providing an initial patch on 0.94 version. 
> What do you think about such a Store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to