Yes I have monitored GC, CPU, disk and network IO, anything else I could think of. Only the CPU usage by the regionserver is on the high side.
I mentioned data local jobs make up generally 240 of the 250 mappers (96%) - I get this information from the jobtracker. Does the JMX console give more accurate information? On May 1, 2013, at 3:56 AM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> wrote: > @Lars, how have your calculated the 35K/row size? I'm not able to find the > same number. > > @Bryan, Matt's idea below is good. With the hadoop test you always had data > locality. Which your HBase test, maybe not. Can you take a look at the JMX > console and tell us your locality % ? Also, over those 45 minutes, have you > monitored the CPWIO, GC activities, etc. to see if any of those might have > impacted the performances? > > JM > > 2013/5/1 Matt Corgan <mcor...@hotpads.com> > >> Not that it's a long-term solution, but try major-compacting before running >> the benchmark. If the LSM tree is CPU bound in merging HFiles/KeyValues >> through the PriorityQueue, then reducing to a single file per region should >> help. The merging of HFiles during a scan is not heavily optimized yet. >> >> >> On Tue, Apr 30, 2013 at 11:21 PM, lars hofhansl <la...@apache.org> wrote: >> >>> If you can, try 0.94.4+; it should significantly reduce the amount of >>> bytes copied around in RAM during scanning, especially if you have wide >>> rows and/or large key portions. That in turns makes scans scale better >>> across cores, since RAM is shared resource between cores (much like >> disk). >>> >>> >>> It's not hard to build the latest HBase against Cloudera's version of >>> Hadoop. I can send along a simple patch to pom.xml to do that. >>> >>> -- Lars >>> >>> >>> >>> ________________________________ >>> From: Bryan Keller <brya...@gmail.com> >>> To: user@hbase.apache.org >>> Sent: Tuesday, April 30, 2013 11:02 PM >>> Subject: Re: Poor HBase map-reduce scan performance >>> >>> >>> The table has hashed keys so rows are evenly distributed amongst the >>> regionservers, and load on each regionserver is pretty much the same. I >>> also have per-table balancing turned on. I get mostly data local mappers >>> with only a few rack local (maybe 10 of the 250 mappers). >>> >>> Currently the table is a wide table schema, with lists of data structures >>> stored as columns with column prefixes grouping the data structures (e.g. >>> 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of >>> moving those data structures to protobuf which would cut down on the >> number >>> of columns. The downside is I can't filter on one value with that, but it >>> is a tradeoff I would make for performance. I was also considering >>> restructuring the table into a tall table. >>> >>> Something interesting is that my old regionserver machines had five 15k >>> SCSI drives instead of 2 SSDs, and performance was about the same. Also, >> my >>> old network was 1gbit, now it is 10gbit. So neither network nor disk I/O >>> appear to be the bottleneck. The CPU is rather high for the regionserver >> so >>> it seems like the best candidate to investigate. I will try profiling it >>> tomorrow and will report back. I may revisit compression on vs off since >>> that is adding load to the CPU. >>> >>> I'll also come up with a sample program that generates data similar to my >>> table. >>> >>> >>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org> wrote: >>> >>>> Your average row is 35k so scanner caching would not make a huge >>> difference, although I would have expected some improvements by setting >> it >>> to 10 or 50 since you have a wide 10ge pipe. >>>> >>>> I assume your table is split sufficiently to touch all RegionServer... >>> Do you see the same load/IO on all region servers? >>>> >>>> A bunch of scan improvements went into HBase since 0.94.2. >>>> I blogged about some of these changes here: >>> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html >>>> >>>> In your case - since you have many columns, each of which carry the >>> rowkey - you might benefit a lot from HBASE-7279. >>>> >>>> In the end HBase *is* slower than straight HDFS for full scans. How >>> could it not be? >>>> So I would start by looking at HDFS first. Make sure Nagle's is >> disbaled >>> in both HBase and HDFS. >>>> >>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy >> Purtell >>> is listening, I think he did some tests with HBase on SSDs. >>>> With rotating media you typically see an improvement with compression. >>> With SSDs the added CPU needed for decompression might outweigh the >>> benefits. >>>> >>>> At the risk of starting a larger discussion here, I would posit that >>> HBase's LSM based design, which trades random IO with sequential IO, >> might >>> be a bit more questionable on SSDs. >>>> >>>> If you can, it would be nice to run a profiler against one of the >>> RegionServers (or maybe do it with the single RS setup) and see where it >> is >>> bottlenecked. >>>> (And if you send me a sample program to generate some data - not 700g, >>> though :) - I'll try to do a bit of profiling during the next days as my >>> day job permits, but I do not have any machines with SSDs). >>>> >>>> -- Lars >>>> >>>> >>>> >>>> >>>> ________________________________ >>>> From: Bryan Keller <brya...@gmail.com> >>>> To: user@hbase.apache.org >>>> Sent: Tuesday, April 30, 2013 9:31 PM >>>> Subject: Re: Poor HBase map-reduce scan performance >>>> >>>> >>>> Yes, I have tried various settings for setCaching() and I have >>> setCacheBlocks(false) >>>> >>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> From http://hbase.apache.org/book.html#mapreduce.example : >>>>> >>>>> scan.setCaching(500); // 1 is the default in Scan, which will >>>>> be bad for MapReduce jobs >>>>> scan.setCacheBlocks(false); // don't set to true for MR jobs >>>>> >>>>> I guess you have used the above setting. >>>>> >>>>> 0.94.x releases are compatible. Have you considered upgrading to, say >>>>> 0.94.7 which was recently released ? >>>>> >>>>> Cheers >>>>> >>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <brya...@gmail.com> >>> wrote: >>>>> >>>>>> I have been attempting to speed up my HBase map-reduce scans for a >>> while >>>>>> now. I have tried just about everything without much luck. I'm >> running >>> out >>>>>> of ideas and was hoping for some suggestions. This is HBase 0.94.2 >> and >>>>>> Hadoop 2.0.0 (CDH4.2.1). >>>>>> >>>>>> The table I'm scanning: >>>>>> 20 mil rows >>>>>> Hundreds of columns/row >>>>>> Column keys can be 30-40 bytes >>>>>> Column values are generally not large, 1k would be on the large side >>>>>> 250 regions >>>>>> Snappy compression >>>>>> 8gb region size >>>>>> 512mb memstore flush >>>>>> 128k block size >>>>>> 700gb of data on HDFS >>>>>> >>>>>> My cluster has 8 datanodes which are also regionservers. Each has 8 >>> cores >>>>>> (16 HT), 64gb RAM, and 2 SSDs. The network is 10gbit. I have a >> separate >>>>>> machine acting as namenode, HMaster, and zookeeper (single >> instance). I >>>>>> have disk local reads turned on. >>>>>> >>>>>> I'm seeing around 5 gbit/sec on average network IO. Each disk is >>> getting >>>>>> 400mb/sec read IO. Theoretically I could get 400mb/sec * 16 = >>> 6.4gb/sec. >>>>>> >>>>>> Using Hadoop's TestDFSIO tool, I'm seeing around 1.4gb/sec read >> speed. >>> Not >>>>>> really that great compared to the theoretical I/O. However this is >> far >>>>>> better than I am seeing with HBase map-reduce scans of my table. >>>>>> >>>>>> I have a simple no-op map-only job (using TableInputFormat) that >> scans >>> the >>>>>> table and does nothing with data. This takes 45 minutes. That's about >>>>>> 260mb/sec read speed. This is over 5x slower than straight HDFS. >>>>>> Basically, with HBase I'm seeing read performance of my 16 SSD >> cluster >>>>>> performing nearly 35% slower than a single SSD. >>>>>> >>>>>> Here are some things I have changed to no avail: >>>>>> Scan caching values >>>>>> HDFS block sizes >>>>>> HBase block sizes >>>>>> Region file sizes >>>>>> Memory settings >>>>>> GC settings >>>>>> Number of mappers/node >>>>>> Compressed vs not compressed >>>>>> >>>>>> One thing I notice is that the regionserver is using quite a bit of >> CPU >>>>>> during the map reduce job. When dumping the jstack of the process, it >>> seems >>>>>> like it is usually in some type of memory allocation or decompression >>>>>> routine which didn't seem abnormal. >>>>>> >>>>>> I can't seem to pinpoint the bottleneck. CPU use by the regionserver >> is >>>>>> high but not maxed out. Disk I/O and network I/O are low, IO wait is >>> low. >>>>>> I'm on the verge of just writing the dataset out to sequence files >>> once a >>>>>> day for scan purposes. Is that what others are doing? >>> >>