On Sat, Dec 29, 2018 at 8:06 AM ming.liu <[email protected]> wrote: > Thanks Stack, > > I have an impression that Get makes a Scan under the cover. But that > cannot explain my observation of the performance difference between Get a > single row vs. San a single row. > > Here is how the Get gets converted into a Scan: https://github.com/apache/hbase/blob/branch-1.2/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java#L6920 Maybe try doing same in your experiment and if still a difference, flle an issue and upload your test code. Explain how you ran your test (copy/paste from here). branch-1.2 is old. I'd be interested in trying your test against branch-2 to see if it has the issue you see.
> I assume the difference comes from the blockcache, Get() will first match > the block cache, if it matches, the call finish and return back. But Scan > will not match the block cache, it will go to memstore and then go to HFile > if it is not in the memstore. > > We first go to memstore, and if we have not satisfied the query, then go to hfiles. Hfiles will fetch from blocks from blockcache if present else will go to hdfs (and then populate cache). Should work this way whether Get or Scan. Thanks, S > My test program will do Get in a loop, for example, 1000 times of Get. > Before the loop, I save the startime, and then after 1000 loops of Get, > save the endtime. So (endtime - startime) / loop-count is the time spent in > each Get operation. > I have that same loop, replacing get() with scan(). The scan() will have > startRowKey = endRowkey, so it is just one row. > > I run the test program many times, using HBase 1.2.0. It shows the Scan is > 2x slower than the get. So I want to understand the root cause. I assume > get() will match the row in blockcache, so it will not go to the memstore > or HFile. But scan() must go to HFile, because in my test, there is no put > operation, just pure read. The row was inserted long time ago. So it should > flush into HFile, and not in the memstore anymore. But I cannot > confirm/verify this. So scan() have to send a request to HDFS to read from > HFile, and it is slower than the get() operation. > > I can paste the test program if the description is still not clear. > > I may need to replace Scan with Get whenever possible, if there do have a > performance difference. But if it is not true, I don't bother to modify > this. > > thanks, > Ming > > -----Original Message----- > From: Stack <[email protected]> > Sent: Saturday, December 29, 2018 11:50 PM > To: Hbase-User <[email protected]> > Subject: Re: Will Scan use blockcache? > > A Get is a one-row Scan. Under the covers the Get makes a Scan. Scan/Get > both have to go to memstore since it will have latest versions of Cells. > > Say more about how you are doing the compare please. > > S > > On Sat, Dec 29, 2018 at 7:02 AM ming.liu <[email protected]> wrote: > > > Hi, all, > > > > > > > > I recently found that short scan is slower than get operation in HBase. > It > > is acceptable, but I really want to understand the reason. > > > > > > > > My testing table only has one row in it. So both Scan and Get just get > one > > row. Scan is still about 2x slower than get operation. > > > > So I want to understand the difference between get(rowkey) and > Scan(rowkey, > > rowkey). > > > > > > > > I think Get will first match in blockcache, if matched, it will go back > > without accessing HFile/Memstore; > > > > Will Scan search in blockcache as well? Or it directly go to > > memstore/HFile? > > > > > > > > thanks, > > > > Ming > > > > > > > > > >
