Re: Poor HBase map-reduce scan performance

Bryan Keller Wed, 01 May 2013 09:39:46 -0700

Yes I have monitored GC, CPU, disk and network IO, anything else I could think 
of. Only the CPU usage by the regionserver is on the high side.


I mentioned data local jobs make up generally 240 of the 250 mappers (96%) - I 
get this information from the jobtracker. Does the JMX console give more 
accurate information?

On May 1, 2013, at 3:56 AM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> wrote:

> @Lars, how have your calculated the 35K/row size? I'm not able to find the
> same number.
> 
> @Bryan, Matt's idea below is good. With the hadoop test you always had data
> locality. Which your HBase test, maybe not. Can you take a look at the JMX
> console and tell us your locality % ? Also, over those 45 minutes, have you
> monitored the CPWIO, GC activities, etc. to see if any of those might have
> impacted the performances?
> 
> JM
> 
> 2013/5/1 Matt Corgan <mcor...@hotpads.com>
> 
>> Not that it's a long-term solution, but try major-compacting before running
>> the benchmark.  If the LSM tree is CPU bound in merging HFiles/KeyValues
>> through the PriorityQueue, then reducing to a single file per region should
>> help.  The merging of HFiles during a scan is not heavily optimized yet.
>> 
>> 
>> On Tue, Apr 30, 2013 at 11:21 PM, lars hofhansl <la...@apache.org> wrote:
>> 
>>> If you can, try 0.94.4+; it should significantly reduce the amount of
>>> bytes copied around in RAM during scanning, especially if you have wide
>>> rows and/or large key portions. That in turns makes scans scale better
>>> across cores, since RAM is shared resource between cores (much like
>> disk).
>>> 
>>> 
>>> It's not hard to build the latest HBase against Cloudera's version of
>>> Hadoop. I can send along a simple patch to pom.xml to do that.
>>> 
>>> -- Lars
>>> 
>>> 
>>> 
>>> ________________________________
>>> From: Bryan Keller <brya...@gmail.com>
>>> To: user@hbase.apache.org
>>> Sent: Tuesday, April 30, 2013 11:02 PM
>>> Subject: Re: Poor HBase map-reduce scan performance
>>> 
>>> 
>>> The table has hashed keys so rows are evenly distributed amongst the
>>> regionservers, and load on each regionserver is pretty much the same. I
>>> also have per-table balancing turned on. I get mostly data local mappers
>>> with only a few rack local (maybe 10 of the 250 mappers).
>>> 
>>> Currently the table is a wide table schema, with lists of data structures
>>> stored as columns with column prefixes grouping the data structures (e.g.
>>> 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of
>>> moving those data structures to protobuf which would cut down on the
>> number
>>> of columns. The downside is I can't filter on one value with that, but it
>>> is a tradeoff I would make for performance. I was also considering
>>> restructuring the table into a tall table.
>>> 
>>> Something interesting is that my old regionserver machines had five 15k
>>> SCSI drives instead of 2 SSDs, and performance was about the same. Also,
>> my
>>> old network was 1gbit, now it is 10gbit. So neither network nor disk I/O
>>> appear to be the bottleneck. The CPU is rather high for the regionserver
>> so
>>> it seems like the best candidate to investigate. I will try profiling it
>>> tomorrow and will report back. I may revisit compression on vs off since
>>> that is adding load to the CPU.
>>> 
>>> I'll also come up with a sample program that generates data similar to my
>>> table.
>>> 
>>> 
>>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org> wrote:
>>> 
>>>> Your average row is 35k so scanner caching would not make a huge
>>> difference, although I would have expected some improvements by setting
>> it
>>> to 10 or 50 since you have a wide 10ge pipe.
>>>> 
>>>> I assume your table is split sufficiently to touch all RegionServer...
>>> Do you see the same load/IO on all region servers?
>>>> 
>>>> A bunch of scan improvements went into HBase since 0.94.2.
>>>> I blogged about some of these changes here:
>>> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html
>>>> 
>>>> In your case - since you have many columns, each of which carry the
>>> rowkey - you might benefit a lot from HBASE-7279.
>>>> 
>>>> In the end HBase *is* slower than straight HDFS for full scans. How
>>> could it not be?
>>>> So I would start by looking at HDFS first. Make sure Nagle's is
>> disbaled
>>> in both HBase and HDFS.
>>>> 
>>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy
>> Purtell
>>> is listening, I think he did some tests with HBase on SSDs.
>>>> With rotating media you typically see an improvement with compression.
>>> With SSDs the added CPU needed for decompression might outweigh the
>>> benefits.
>>>> 
>>>> At the risk of starting a larger discussion here, I would posit that
>>> HBase's LSM based design, which trades random IO with sequential IO,
>> might
>>> be a bit more questionable on SSDs.
>>>> 
>>>> If you can, it would be nice to run a profiler against one of the
>>> RegionServers (or maybe do it with the single RS setup) and see where it
>> is
>>> bottlenecked.
>>>> (And if you send me a sample program to generate some data - not 700g,
>>> though :) - I'll try to do a bit of profiling during the next days as my
>>> day job permits, but I do not have any machines with SSDs).
>>>> 
>>>> -- Lars
>>>> 
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> From: Bryan Keller <brya...@gmail.com>
>>>> To: user@hbase.apache.org
>>>> Sent: Tuesday, April 30, 2013 9:31 PM
>>>> Subject: Re: Poor HBase map-reduce scan performance
>>>> 
>>>> 
>>>> Yes, I have tried various settings for setCaching() and I have
>>> setCacheBlocks(false)
>>>> 
>>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>> 
>>>>> From http://hbase.apache.org/book.html#mapreduce.example :
>>>>> 
>>>>> scan.setCaching(500);        // 1 is the default in Scan, which will
>>>>> be bad for MapReduce jobs
>>>>> scan.setCacheBlocks(false);  // don't set to true for MR jobs
>>>>> 
>>>>> I guess you have used the above setting.
>>>>> 
>>>>> 0.94.x releases are compatible. Have you considered upgrading to, say
>>>>> 0.94.7 which was recently released ?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <brya...@gmail.com>
>>> wrote:
>>>>> 
>>>>>> I have been attempting to speed up my HBase map-reduce scans for a
>>> while
>>>>>> now. I have tried just about everything without much luck. I'm
>> running
>>> out
>>>>>> of ideas and was hoping for some suggestions. This is HBase 0.94.2
>> and
>>>>>> Hadoop 2.0.0 (CDH4.2.1).
>>>>>> 
>>>>>> The table I'm scanning:
>>>>>> 20 mil rows
>>>>>> Hundreds of columns/row
>>>>>> Column keys can be 30-40 bytes
>>>>>> Column values are generally not large, 1k would be on the large side
>>>>>> 250 regions
>>>>>> Snappy compression
>>>>>> 8gb region size
>>>>>> 512mb memstore flush
>>>>>> 128k block size
>>>>>> 700gb of data on HDFS
>>>>>> 
>>>>>> My cluster has 8 datanodes which are also regionservers. Each has 8
>>> cores
>>>>>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a
>> separate
>>>>>> machine acting as namenode, HMaster, and zookeeper (single
>> instance). I
>>>>>> have disk local reads turned on.
>>>>>> 
>>>>>> I'm seeing around 5 gbit/sec on average network IO. Each disk is
>>> getting
>>>>>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 =
>>> 6.4gb/sec.
>>>>>> 
>>>>>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read
>> speed.
>>> Not
>>>>>> really that great compared to the theoretical I/O. However this is
>> far
>>>>>> better than I am seeing with HBase map-reduce scans of my table.
>>>>>> 
>>>>>> I have a simple no-op map-only job (using TableInputFormat) that
>> scans
>>> the
>>>>>> table and does nothing with data. This takes 45 minutes. That's about
>>>>>> 260mb/sec read speed. This is over 5x slower than straight HDFS.
>>>>>> Basically, with HBase I'm seeing read performance of my 16 SSD
>> cluster
>>>>>> performing nearly 35% slower than a single SSD.
>>>>>> 
>>>>>> Here are some things I have changed to no avail:
>>>>>> Scan caching values
>>>>>> HDFS block sizes
>>>>>> HBase block sizes
>>>>>> Region file sizes
>>>>>> Memory settings
>>>>>> GC settings
>>>>>> Number of mappers/node
>>>>>> Compressed vs not compressed
>>>>>> 
>>>>>> One thing I notice is that the regionserver is using quite a bit of
>> CPU
>>>>>> during the map reduce job. When dumping the jstack of the process, it
>>> seems
>>>>>> like it is usually in some type of memory allocation or decompression
>>>>>> routine which didn't seem abnormal.
>>>>>> 
>>>>>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver
>> is
>>>>>> high but not maxed out. Disk I/O and network I/O are low, IO wait is
>>> low.
>>>>>> I'm on the verge of just writing the dataset out to sequence files
>>> once a
>>>>>> day for scan purposes. Is that what others are doing?
>>> 
>>

Re: Poor HBase map-reduce scan performance

Reply via email to