@Lars, how have your calculated the 35K/row size? I'm not able to find the same number.
@Bryan, Matt's idea below is good. With the hadoop test you always had data locality. Which your HBase test, maybe not. Can you take a look at the JMX console and tell us your locality % ? Also, over those 45 minutes, have you monitored the CPWIO, GC activities, etc. to see if any of those might have impacted the performances? JM 2013/5/1 Matt Corgan <mcor...@hotpads.com> > Not that it's a long-term solution, but try major-compacting before running > the benchmark. If the LSM tree is CPU bound in merging HFiles/KeyValues > through the PriorityQueue, then reducing to a single file per region should > help. The merging of HFiles during a scan is not heavily optimized yet. > > > On Tue, Apr 30, 2013 at 11:21 PM, lars hofhansl <la...@apache.org> wrote: > > > If you can, try 0.94.4+; it should significantly reduce the amount of > > bytes copied around in RAM during scanning, especially if you have wide > > rows and/or large key portions. That in turns makes scans scale better > > across cores, since RAM is shared resource between cores (much like > disk). > > > > > > It's not hard to build the latest HBase against Cloudera's version of > > Hadoop. I can send along a simple patch to pom.xml to do that. > > > > -- Lars > > > > > > > > ________________________________ > > From: Bryan Keller <brya...@gmail.com> > > To: user@hbase.apache.org > > Sent: Tuesday, April 30, 2013 11:02 PM > > Subject: Re: Poor HBase map-reduce scan performance > > > > > > The table has hashed keys so rows are evenly distributed amongst the > > regionservers, and load on each regionserver is pretty much the same. I > > also have per-table balancing turned on. I get mostly data local mappers > > with only a few rack local (maybe 10 of the 250 mappers). > > > > Currently the table is a wide table schema, with lists of data structures > > stored as columns with column prefixes grouping the data structures (e.g. > > 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of > > moving those data structures to protobuf which would cut down on the > number > > of columns. The downside is I can't filter on one value with that, but it > > is a tradeoff I would make for performance. I was also considering > > restructuring the table into a tall table. > > > > Something interesting is that my old regionserver machines had five 15k > > SCSI drives instead of 2 SSDs, and performance was about the same. Also, > my > > old network was 1gbit, now it is 10gbit. So neither network nor disk I/O > > appear to be the bottleneck. The CPU is rather high for the regionserver > so > > it seems like the best candidate to investigate. I will try profiling it > > tomorrow and will report back. I may revisit compression on vs off since > > that is adding load to the CPU. > > > > I'll also come up with a sample program that generates data similar to my > > table. > > > > > > On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org> wrote: > > > > > Your average row is 35k so scanner caching would not make a huge > > difference, although I would have expected some improvements by setting > it > > to 10 or 50 since you have a wide 10ge pipe. > > > > > > I assume your table is split sufficiently to touch all RegionServer... > > Do you see the same load/IO on all region servers? > > > > > > A bunch of scan improvements went into HBase since 0.94.2. > > > I blogged about some of these changes here: > > http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html > > > > > > In your case - since you have many columns, each of which carry the > > rowkey - you might benefit a lot from HBASE-7279. > > > > > > In the end HBase *is* slower than straight HDFS for full scans. How > > could it not be? > > > So I would start by looking at HDFS first. Make sure Nagle's is > disbaled > > in both HBase and HDFS. > > > > > > And lastly SSDs are somewhat new territory for HBase. Maybe Andy > Purtell > > is listening, I think he did some tests with HBase on SSDs. > > > With rotating media you typically see an improvement with compression. > > With SSDs the added CPU needed for decompression might outweigh the > > benefits. > > > > > > At the risk of starting a larger discussion here, I would posit that > > HBase's LSM based design, which trades random IO with sequential IO, > might > > be a bit more questionable on SSDs. > > > > > > If you can, it would be nice to run a profiler against one of the > > RegionServers (or maybe do it with the single RS setup) and see where it > is > > bottlenecked. > > > (And if you send me a sample program to generate some data - not 700g, > > though :) - I'll try to do a bit of profiling during the next days as my > > day job permits, but I do not have any machines with SSDs). > > > > > > -- Lars > > > > > > > > > > > > > > > ________________________________ > > > From: Bryan Keller <brya...@gmail.com> > > > To: user@hbase.apache.org > > > Sent: Tuesday, April 30, 2013 9:31 PM > > > Subject: Re: Poor HBase map-reduce scan performance > > > > > > > > > Yes, I have tried various settings for setCaching() and I have > > setCacheBlocks(false) > > > > > > On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > > > > >> From http://hbase.apache.org/book.html#mapreduce.example : > > >> > > >> scan.setCaching(500); // 1 is the default in Scan, which will > > >> be bad for MapReduce jobs > > >> scan.setCacheBlocks(false); // don't set to true for MR jobs > > >> > > >> I guess you have used the above setting. > > >> > > >> 0.94.x releases are compatible. Have you considered upgrading to, say > > >> 0.94.7 which was recently released ? > > >> > > >> Cheers > > >> > > >> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <brya...@gmail.com> > > wrote: > > >> > > >>> I have been attempting to speed up my HBase map-reduce scans for a > > while > > >>> now. I have tried just about everything without much luck. I'm > running > > out > > >>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 > and > > >>> Hadoop 2.0.0 (CDH4.2.1). > > >>> > > >>> The table I'm scanning: > > >>> 20 mil rows > > >>> Hundreds of columns/row > > >>> Column keys can be 30-40 bytes > > >>> Column values are generally not large, 1k would be on the large side > > >>> 250 regions > > >>> Snappy compression > > >>> 8gb region size > > >>> 512mb memstore flush > > >>> 128k block size > > >>> 700gb of data on HDFS > > >>> > > >>> My cluster has 8 datanodes which are also regionservers. Each has 8 > > cores > > >>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a > separate > > >>> machine acting as namenode, HMaster, and zookeeper (single > instance). I > > >>> have disk local reads turned on. > > >>> > > >>> I'm seeing around 5 gbit/sec on average network IO. Each disk is > > getting > > >>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 = > > 6.4gb/sec. > > >>> > > >>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read > speed. > > Not > > >>> really that great compared to the theoretical I/O. However this is > far > > >>> better than I am seeing with HBase map-reduce scans of my table. > > >>> > > >>> I have a simple no-op map-only job (using TableInputFormat) that > scans > > the > > >>> table and does nothing with data. This takes 45 minutes. That's about > > >>> 260mb/sec read speed. This is over 5x slower than straight HDFS. > > >>> Basically, with HBase I'm seeing read performance of my 16 SSD > cluster > > >>> performing nearly 35% slower than a single SSD. > > >>> > > >>> Here are some things I have changed to no avail: > > >>> Scan caching values > > >>> HDFS block sizes > > >>> HBase block sizes > > >>> Region file sizes > > >>> Memory settings > > >>> GC settings > > >>> Number of mappers/node > > >>> Compressed vs not compressed > > >>> > > >>> One thing I notice is that the regionserver is using quite a bit of > CPU > > >>> during the map reduce job. When dumping the jstack of the process, it > > seems > > >>> like it is usually in some type of memory allocation or decompression > > >>> routine which didn't seem abnormal. > > >>> > > >>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver > is > > >>> high but not maxed out. Disk I/O and network I/O are low, IO wait is > > low. > > >>> I'm on the verge of just writing the dataset out to sequence files > > once a > > >>> day for scan purposes. Is that what others are doing? > > >