You really don't want to mess around with the block size. Sure larger blocks are better for sequential scans, but the minute you do a lot of random ad hoc fetches... you're kinda screwed.
On May 3, 2013, at 2:17 AM, Bryan Keller <brya...@gmail.com> wrote: > I finally made some progress. I tried a very large HBase block size (16mb), > and it significantly improved scan performance. I went from 45-50 min to 24 > min. Not great but much better. Before I had it set to 128k. Scanning an > equivalent sequence file takes 10 min. My random read performance will > probably suffer with such a large block size (theoretically), so I probably > can't keep it this big. I care about random read performance too. I've read > having a block size this big is not recommended, is that correct? > > I haven't dug too deeply into the code, are the block buffers reused or is > each new block read a new allocation? Perhaps a buffer pool could help here > if there isn't one already. When doing a scan, HBase could reuse previously > allocated block buffers instead of allocating a new one for each block. Then > block size shouldn't affect scan performance much. > > I'm not using a block encoder. Also, I'm still sifting through the profiler > results, I'll see if I can make more sense of it and run some more > experiments. > > On May 2, 2013, at 5:46 PM, lars hofhansl <la...@apache.org> wrote: > >> Interesting. If you can try 0.94.7 (but it'll probably not have changed that >> much from 0.94.4) >> >> >> Do you have enabled one of the block encoders (FAST_DIFF, etc)? If so, try >> without. They currently need to reallocate a ByteBuffer for each single KV. >> (Sine you see ScannerV2 rather than EncodedScannerV2 you probably have not >> enabled encoding, just checking). >> >> >> And do you have a stack trace for the ByteBuffer.allocate(). That is a >> strange one since it never came up in my profiling (unless you enabled block >> encoding). >> (You can get traces from VisualVM by creating a snapshot, but you'd have to >> drill in to find the allocate()). >> >> >> During normal scanning (again, without encoding) there should be no >> allocation happening except for blocks read from disk (and they should all >> be the same size, thus allocation should be cheap). >> >> -- Lars >> >> >> >> ________________________________ >> From: Bryan Keller <brya...@gmail.com> >> To: user@hbase.apache.org >> Sent: Thursday, May 2, 2013 10:54 AM >> Subject: Re: Poor HBase map-reduce scan performance >> >> >> I ran one of my regionservers through VisualVM. It looks like the top hot >> spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). >> It appears at first glance that memory allocations may be an issue. >> Decompression was next below that but less of an issue it seems. >> >> Would changing the block size, either HDFS or HBase, help here? >> >> Also, if anyone has tips on how else to profile, that would be appreciated. >> VisualVM can produce a lot of noise that is hard to sift through. >> >> >> On May 1, 2013, at 9:49 PM, Bryan Keller <brya...@gmail.com> wrote: >> >>> I used exactly 0.94.4, pulled from the tag in subversion. >>> >>> On May 1, 2013, at 9:41 PM, lars hofhansl <la...@apache.org> wrote: >>> >>>> Hmm... Did you actually use exactly version 0.94.4, or the latest 0.94.7. >>>> I would be very curious to see profiling data. >>>> >>>> -- Lars >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: Bryan Keller <brya...@gmail.com> >>>> To: "user@hbase.apache.org" <user@hbase.apache.org> >>>> Cc: >>>> Sent: Wednesday, May 1, 2013 6:01 PM >>>> Subject: Re: Poor HBase map-reduce scan performance >>>> >>>> I tried running my test with 0.94.4, unfortunately performance was about >>>> the same. I'm planning on profiling the regionserver and trying some other >>>> things tonight and tomorrow and will report back. >>>> >>>> On May 1, 2013, at 8:00 AM, Bryan Keller <brya...@gmail.com> wrote: >>>> >>>>> Yes I would like to try this, if you can point me to the pom.xml patch >>>>> that would save me some time. >>>>> >>>>> On Tuesday, April 30, 2013, lars hofhansl wrote: >>>>> If you can, try 0.94.4+; it should significantly reduce the amount of >>>>> bytes copied around in RAM during scanning, especially if you have wide >>>>> rows and/or large key portions. That in turns makes scans scale better >>>>> across cores, since RAM is shared resource between cores (much like disk). >>>>> >>>>> >>>>> It's not hard to build the latest HBase against Cloudera's version of >>>>> Hadoop. I can send along a simple patch to pom.xml to do that. >>>>> >>>>> -- Lars >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> From: Bryan Keller <brya...@gmail.com> >>>>> To: user@hbase.apache.org >>>>> Sent: Tuesday, April 30, 2013 11:02 PM >>>>> Subject: Re: Poor HBase map-reduce scan performance >>>>> >>>>> >>>>> The table has hashed keys so rows are evenly distributed amongst the >>>>> regionservers, and load on each regionserver is pretty much the same. I >>>>> also have per-table balancing turned on. I get mostly data local mappers >>>>> with only a few rack local (maybe 10 of the 250 mappers). >>>>> >>>>> Currently the table is a wide table schema, with lists of data structures >>>>> stored as columns with column prefixes grouping the data structures (e.g. >>>>> 1_name, 1_address, 1_city, 2_name, 2_address, 2_city). I was thinking of >>>>> moving those data structures to protobuf which would cut down on the >>>>> number of columns. The downside is I can't filter on one value with that, >>>>> but it is a tradeoff I would make for performance. I was also considering >>>>> restructuring the table into a tall table. >>>>> >>>>> Something interesting is that my old regionserver machines had five 15k >>>>> SCSI drives instead of 2 SSDs, and performance was about the same. Also, >>>>> my old network was 1gbit, now it is 10gbit. So neither network nor disk >>>>> I/O appear to be the bottleneck. The CPU is rather high for the >>>>> regionserver so it seems like the best candidate to investigate. I will >>>>> try profiling it tomorrow and will report back. I may revisit compression >>>>> on vs off since that is adding load to the CPU. >>>>> >>>>> I'll also come up with a sample program that generates data similar to my >>>>> table. >>>>> >>>>> >>>>> On Apr 30, 2013, at 10:01 PM, lars hofhansl <la...@apache.org> wrote: >>>>> >>>>>> Your average row is 35k so scanner caching would not make a huge >>>>>> difference, although I would have expected some improvements by setting >>>>>> it to 10 or 50 since you have a wide 10ge pipe. >>>>>> >>>>>> I assume your table is split sufficiently to touch all RegionServer... >>>>>> Do you see the same load/IO on all region servers? >>>>>> >>>>>> A bunch of scan improvements went into HBase since 0.94.2. >>>>>> I blogged about some of these changes here: >>>>>> http://hadoop-hbase.blogspot.com/2012/12/hbase-profiling.html >>>>>> >>>>>> In your case - since you have many columns, each of which carry the >>>>>> rowkey - you might benefit a lot from HBASE-7279. >>>>>> >>>>>> In the end HBase *is* slower than straight HDFS for full scans. How >>>>>> could it not be? >>>>>> So I would start by looking at HDFS first. Make sure Nagle's is disbaled >>>>>> in both HBase and HDFS. >>>>>> >>>>>> And lastly SSDs are somewhat new territory for HBase. Maybe Andy Purtell >>>>>> is listening, I think he did some tests with HBase on SSDs. >>>>>> With rotating media you typically see an improvement with compression. >>>>>> With SSDs the added CPU needed for decompression might outweigh the >>>>>> benefits. >>>>>> >>>>>> At the risk of starting a larger discussion here, I would posit that >>>>>> HBase's LSM based design, which trades random IO with sequential IO, >>>>>> might be a bit more questionable on SSDs. >>>>>> >>>>>> If you can, it would be nice to run a profiler against one of the >>>>>> RegionServers (or maybe do it with the single RS setup) and see where it >>>>>> is bottlenecked. >>>>>> (And if you send me a sample program to generate some data - not 700g, >>>>>> though :) - I'll try to do a bit of profiling during the next days as my >>>>>> day job permits, but I do not have any machines with SSDs). >>>>>> >>>>>> -- Lars >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> From: Bryan Keller <brya...@gmail.com> >>>>>> To: user@hbase.apache.org >>>>>> Sent: Tuesday, April 30, 2013 9:31 PM >>>>>> Subject: Re: Poor HBase map-reduce scan performance >>>>>> >>>>>> >>>>>> Yes, I have tried various settings for setCaching() and I have >>>>>> setCacheBlocks(false) >>>>>> >>>>>> On Apr 30, 2013, at 9:17 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>>> From http://hbase.apache.org/book.html#mapreduce.example : >>>>>>> >>>>>>> scan.setCaching(500); // 1 is the default in Scan, which will >>>>>>> be bad for MapReduce jobs >>>>>>> scan.setCacheBlocks(false); // don't set to true for MR jobs >>>>>>> >>>>>>> I guess you have used the above setting. >>>>>>> >>>>>>> 0.94.x releases are compatible. Have you considered upgrading to, say >>>>>>> 0.94.7 which was recently released ? >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On Tue, Apr 30, 2013 at 9:01 PM, Bryan Keller <bryanck@gm >>>> > > The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. Use at your own risk. Michael Segel michael_segel (AT) hotmail.com