[ 
https://issues.apache.org/jira/browse/HBASE-10676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950565#comment-13950565
 ] 

hongliang commented on HBASE-10676:
-----------------------------------

it's great

> Removing ThreadLocal of PrefetchedHeader in HFileBlock.FSReaderV2 make higher 
> perforamce of scan
> ------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-10676
>                 URL: https://issues.apache.org/jira/browse/HBASE-10676
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.99.0
>            Reporter: zhaojianbo
>            Assignee: zhaojianbo
>         Attachments: HBASE-10676-0.98-branch-AtomicReferenceV2.patch, 
> HBASE-10676-0.98-branchV2.patch
>
>
> PrefetchedHeader variable in HFileBlock.FSReaderV2 is used for avoiding 
> backward seek operation as the comment said:
> {quote}
> we will not incur a backward seek operation if we have already read this 
> block's header as part of the previous read's look-ahead. And we also want to 
> skip reading the header again if it has already been read.
> {quote}
> But that is not the case. In the code of 0.98, prefetchedHeader is 
> threadlocal for one storefile reader, and in the RegionScanner 
> lifecycle,different rpc handlers will serve scan requests of the same 
> scanner. Even though one handler of previous scan call prefetched the next 
> block header, the other handlers of current scan call will still trigger a 
> backward seek operation. The process is like this:
> # rs handler1 serves the scan call, reads block1 and prefetches the header of 
> block2
> # rs handler2 serves the same scanner's next scan call, because rs handler2 
> doesn't know the header of block2 already prefetched by rs handler1, triggers 
> a backward seek and reads block2, and prefetches the header of block3.
> It is not the sequential read. So I think that the threadlocal is useless, 
> and should be abandoned. I did the work, and evaluated the performance of one 
> client, two client and four client scanning the same region with one 
> storefile.  The test environment is
> # A hdfs cluster with a namenode, a secondary namenode , a datanode in a 
> machine
> # A hbase cluster with a zk, a master, a regionserver in the same machine
> # clients are also in the same machine.
> So all the data is local. The storefile is about 22.7GB from our online data, 
> 18995949 kvs. Caching is set 1000. And setCacheBlocks(false)
> With the improvement, the client total scan time decreases 21% for the one 
> client case, 11% for the two clients case. But the four clients case is 
> almost the same. The details tests' data is the following:
> ||case||client||time(ms)||
> | original | 1 | 306222 |
> | new | 1 | 241313 |
> | original | 2 | 416390 |
> | new | 2 | 369064 |
> | original | 4 | 555986 |
> | new | 4 | 562152 |
> With some modification(see the comments below), the newest result is 
> ||case||client||time(ms)||case||client||time(ms)||case||client||time(ms)||
> |original|1|306222|new with synchronized|1|239510|new with 
> AtomicReference|1|241243|
> |original|2|416390|new with synchronized|2|365367|new with 
> AtomicReference|2|368952|
> |original|4|555986|new with synchronized|4|540642|new with 
> AtomicReference|4|545715|
> |original|8|854029|new with synchronized|8|852137|new with 
> AtomicReference|8|850401|



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to