Re: Poor HBase map-reduce scan performance

Michael Segel Sun, 05 May 2013 20:10:17 -0700

You really don't want to mess around with the block size.

Sure larger blocks are better for sequential scans, but the minute you do a lot 
of random ad hoc fetches... you're kinda screwed.



On May 3, 2013, at 2:17 AM, Bryan Keller <brya...@gmail.com> wrote:

> I finally made some progress. I tried a very large HBase block size (16mb), 
> and it significantly improved scan performance. I went from 45-50 min to 24 
> min. Not great but much better. Before I had it set to 128k. Scanning an 
> equivalent sequence file takes 10 min. My random read performance will 
> probably suffer with such a large block size (theoretically), so I probably 
> can't keep it this big. I care about random read performance too. I've read 
> having a block size this big is not recommended, is that correct?
> 
> I haven't dug too deeply into the code, are the block buffers reused or is 
> each new block read a new allocation? Perhaps a buffer pool could help here 
> if there isn't one already. When doing a scan, HBase could reuse previously 
> allocated block buffers instead of allocating a new one for each block. Then 
> block size shouldn't affect scan performance much.
> 
> I'm not using a block encoder. Also, I'm still sifting through the profiler 
> results, I'll see if I can make more sense of it and run some more 
> experiments.
> 
> On May 2, 2013, at 5:46 PM, lars hofhansl <la...@apache.org> wrote:
> 
>> Interesting. If you can try 0.94.7 (but it'll probably not have changed that 
>> much from 0.94.4)
>> 
>> 
>> Do you have enabled one of the block encoders (FAST_DIFF, etc)? If so, try 
>> without. They currently need to reallocate a ByteBuffer for each single KV.
>> (Sine you see ScannerV2 rather than EncodedScannerV2 you probably have not 
>> enabled encoding, just checking).
>> 
>> 
>> And do you have a stack trace for the ByteBuffer.allocate(). That is a 
>> strange one since it never came up in my profiling (unless you enabled block 
>> encoding).
>> (You can get traces from VisualVM by creating a snapshot, but you'd have to 
>> drill in to find the allocate()).
>> 
>> 
>> During normal scanning (again, without encoding) there should be no 
>> allocation happening except for blocks read from disk (and they should all 
>> be the same size, thus allocation should be cheap).
>> 
>> -- Lars
>> 
>> 
>> 
>> ________________________________
>> From: Bryan Keller <brya...@gmail.com>
>> To: user@hbase.apache.org 
>> Sent: Thursday, May 2, 2013 10:54 AM
>> Subject: Re: Poor HBase map-reduce scan performance
>> 
>> 
>> I ran one of my regionservers through VisualVM. It looks like the top hot 
>> spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). 
>> It appears at first glance that memory allocations may be an issue. 
>> Decompression was next below that but less of an issue it seems.
>> 
>> Would changing the block size, either HDFS or HBase, help here?
>> 
>> Also, if anyone has tips on how else to profile, that would be appreciated. 
>> VisualVM can produce a lot of noise that is hard to sift through.
>> 
>> 
>> On May 1, 2013, at 9:49 PM, Bryan Keller <brya...@gmail.com> wrote:
>> 
>>> I used exactly 0.94.4, pulled from the tag in subversion.
>>> 
>>> On May 1, 2013, at 9:41 PM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>>> Hmm... Did you actually use exactly version 0.94.4, or the latest 0.94.7.
>>>> I would be very curious to see profiling data.
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>> From: Bryan Keller <brya...@gmail.com>
>>>> To: "user@hbase.apache.org" <user@hbase.apache.org>
>>>> Cc: 
>>>> Sent: Wednesday, May 1, 2013 6:01 PM
>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>> 
>>>> I tried running my test with 0.94.4, unfortunately performance was about 
>>>> the same. I'm planning on profiling the regionserver and trying some other 
>>>> things tonight and tomorrow and will report back.
>>>> 
>>>> On May 1, 2013, at 8:00 AM, Bryan Keller <brya...@gmail.com> wrote:
>>>> 
>>>>> Yes I would like to try this, if you can point me to the pom.xml patch 
>>>>> that would save me some time.
>>>>> 
>>>>> On Tuesday, April 30, 2013, lars hofhansl wrote:
>>>>> If you can, try 0.94.4+; it should significantly reduce the amount of 
>>>>> bytes copied around in RAM during scanning, especially if you have wide 
>>>>> rows and/or large key portions. That in turns makes scans scale better 
>>>>> across cores, since RAM is shared resource between cores (much like disk).
>>>>> 
>>>>> 
>>>>> It's not hard to build the latest HBase against Cloudera's version of 
>>>>> Hadoop. I can send along a simple patch to pom.xml to do that.
>>>>> 
>>>>> -- Lars
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>>  From: Bryan Keller <brya...@gmail.com>
>>>>> To: user@hbase.apache.org
>>>>> Sent: Tuesday, April 30, 2013 11:02 PM
>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>> 
>>>>> 
>>>>> The table has hashed keys so rows are evenly distributed amongst the 
>>>>> regionservers, and load on each regionserver is pretty much the same. I 
>>>>> also have per-table balancing turned on. I get mostly data local mappers 
>>>>> with only a few rack local (maybe 10 of the 250 mappers).
>>>>> 
>>>>> Currently the table is a wide table schema, with lists of data structures 
>>>>> stored as columns with column prefixes grouping the data structures (e.g. 
>>>>> 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of 
>>>>> moving those data structures to protobuf which would cut down on the 
>>>>> number of columns. The downside is I can't filter on one value with that, 
>>>>> but it is a tradeoff I would make for performance. I was also considering 
>>>>> restructuring the table into a tall table.
>>>>> 
>>>>> Something interesting is that my old regionserver machines had five 15k 
>>>>> SCSI drives instead of 2 SSDs, and performance was about the same. Also, 
>>>>> my old network was 1gbit, now it is 10gbit. So neither network nor disk 
>>>>> I/O appear to be the bottleneck. The CPU is rather high for the 
>>>>> regionserver so it seems like the best candidate to investigate. I will 
>>>>> try profiling it tomorrow and will report back. I may revisit compression 
>>>>> on vs off since that is adding load to the CPU.
>>>>> 
>>>>> I'll also come up with a sample program that generates data similar to my 
>>>>> table.
>>>>> 
>>>>> 
>>>>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org> wrote:
>>>>> 
>>>>>> Your average row is 35k so scanner caching would not make a huge 
>>>>>> difference, although I would have expected some improvements by setting 
>>>>>> it to 10 or 50 since you have a wide 10ge pipe.
>>>>>> 
>>>>>> I assume your table is split sufficiently to touch all RegionServer... 
>>>>>> Do you see the same load/IO on all region servers?
>>>>>> 
>>>>>> A bunch of scan improvements went into HBase since 0.94.2.
>>>>>> I blogged about some of these changes here: 
>>>>>> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
>>>>>> 
>>>>>> In your case - since you have many columns, each of which carry the 
>>>>>> rowkey - you might benefit a lot from HBASE-7279.
>>>>>> 
>>>>>> In the end HBase *is* slower than straight HDFS for full scans. How 
>>>>>> could it not be?
>>>>>> So I would start by looking at HDFS first. Make sure Nagle's is disbaled 
>>>>>> in both HBase and HDFS.
>>>>>> 
>>>>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy Purtell 
>>>>>> is listening, I think he did some tests with HBase on SSDs.
>>>>>> With rotating media you typically see an improvement with compression. 
>>>>>> With SSDs the added CPU needed for decompression might outweigh the 
>>>>>> benefits.
>>>>>> 
>>>>>> At the risk of starting a larger discussion here, I would posit that 
>>>>>> HBase's LSM based design, which trades random IO with sequential IO, 
>>>>>> might be a bit more questionable on SSDs.
>>>>>> 
>>>>>> If you can, it would be nice to run a profiler against one of the 
>>>>>> RegionServers (or maybe do it with the single RS setup) and see where it 
>>>>>> is bottlenecked.
>>>>>> (And if you send me a sample program to generate some data - not 700g, 
>>>>>> though :) - I'll try to do a bit of profiling during the next days as my 
>>>>>> day job permits, but I do not have any machines with SSDs).
>>>>>> 
>>>>>> -- Lars
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ________________________________
>>>>>> From: Bryan Keller <brya...@gmail.com>
>>>>>> To: user@hbase.apache.org
>>>>>> Sent: Tuesday, April 30, 2013 9:31 PM
>>>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>>>> 
>>>>>> 
>>>>>> Yes, I have tried various settings for setCaching() and I have 
>>>>>> setCacheBlocks(false)
>>>>>> 
>>>>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>> 
>>>>>>> From http://hbase.apache.org/book.html#mapreduce.example :
>>>>>>> 
>>>>>>> scan.setCaching(500);        // 1 is the default in Scan, which will
>>>>>>> be bad for MapReduce jobs
>>>>>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>>>>>> 
>>>>>>> I guess you have used the above setting.
>>>>>>> 
>>>>>>> 0.94.x releases are compatible. Have you considered upgrading to, say
>>>>>>> 0.94.7 which was recently released ?
>>>>>>> 
>>>>>>> Cheers
>>>>>>> 
>>>>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gm
>>>> 
> 
> 

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Poor HBase map-reduce scan performance

Reply via email to