I attached my patch to the JIRA issue, in case anyone is interested. It can pretty easily be used on its own without patching HBase. I am currently doing this.
On Jul 1, 2013, at 2:23 PM, Enis Söztutar <enis....@gmail.com> wrote: > Bryan, > > 3.6x improvement seems exciting. The ballpark difference between HBase scan > and hdfs scan is in that order, so it is expected I guess. > > I plan to get back to the trunk patch, add more tests etc next week. In the > mean time, if you have any changes to the patch, pls attach the patch. > > Enis > > > On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl <la...@apache.org> wrote: > >> Absolutely. >> >> >> >> ----- Original Message ----- >> From: Ted Yu <yuzhih...@gmail.com> >> To: user@hbase.apache.org >> Cc: >> Sent: Sunday, June 30, 2013 9:32 PM >> Subject: Re: Poor HBase map-reduce scan performance >> >> Looking at the tail of HBASE-8369, there were some comments which are yet >> to be addressed. >> >> I think trunk patch should be finalized before backporting. >> >> Cheers >> >> On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller <brya...@gmail.com> wrote: >> >>> I'll attach my patch to HBASE-8369 tomorrow. >>> >>> On Jun 28, 2013, at 10:56 AM, lars hofhansl <la...@apache.org> wrote: >>> >>>> If we can make a clean patch with minimal impact to existing code I >>> would be supportive of a backport to 0.94. >>>> >>>> -- Lars >>>> >>>> >>>> >>>> ----- Original Message ----- >>>> From: Bryan Keller <brya...@gmail.com> >>>> To: user@hbase.apache.org; lars hofhansl <la...@apache.org> >>>> Cc: >>>> Sent: Tuesday, June 25, 2013 1:56 AM >>>> Subject: Re: Poor HBase map-reduce scan performance >>>> >>>> I tweaked Enis's snapshot input format and backported it to 0.94.6 and >>> have snapshot scanning functional on my system. Performance is >> dramatically >>> better, as expected i suppose. I'm seeing about 3.6x faster performance >> vs >>> TableInputFormat. Also, HBase doesn't get bogged down during a scan as >> the >>> regionserver is being bypassed. I'm very excited by this. There are some >>> issues with file permissions and library dependencies but nothing that >>> can't be worked out. >>>> >>>> On Jun 5, 2013, at 6:03 PM, lars hofhansl <la...@apache.org> wrote: >>>> >>>>> That's exactly the kind of pre-fetching I was investigating a bit ago >>> (made a patch, but ran out of time). >>>>> This pre-fetching is strictly client only, where the client keeps the >>> server busy while it is processing the previous batch, but filling up a >> 2nd >>> buffer. >>>>> >>>>> >>>>> -- Lars >>>>> >>>>> >>>>> >>>>> ________________________________ >>>>> From: Sandy Pratt <prat...@adobe.com> >>>>> To: "user@hbase.apache.org" <user@hbase.apache.org> >>>>> Sent: Wednesday, June 5, 2013 10:58 AM >>>>> Subject: Re: Poor HBase map-reduce scan performance >>>>> >>>>> >>>>> Yong, >>>>> >>>>> As a thought experiment, imagine how it impacts the throughput of TCP >> to >>>>> keep the window size at 1. That means there's only one packet in >> flight >>>>> at a time, and total throughput is a fraction of what it could be. >>>>> >>>>> That's effectively what happens with RPC. The server sends a batch, >>> then >>>>> does nothing while it waits for the client to ask for more. During >> that >>>>> time, the pipe between them is empty. Increasing the batch size can >>> help >>>>> a bit, in essence creating a really huge packet, but the problem >>> remains. >>>>> There will always be stalls in the pipe. >>>>> >>>>> What you want is for the window size to be large enough that the pipe >> is >>>>> saturated. A streaming API accomplishes that by stuffing data down >> the >>>>> network pipe as quickly as possible. >>>>> >>>>> Sandy >>>>> >>>>> On 6/5/13 7:55 AM, "yonghu" <yongyong...@gmail.com> wrote: >>>>> >>>>>> Can anyone explain why client + rpc + server will decrease the >>> performance >>>>>> of scanning? I mean the Regionserver and Tasktracker are the same >> node >>>>>> when >>>>>> you use MapReduce to scan the HBase table. So, in my understanding, >>> there >>>>>> will be no rpc cost. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Yong >>>>>> >>>>>> >>>>>> On Wed, Jun 5, 2013 at 10:09 AM, Sandy Pratt <prat...@adobe.com> >>> wrote: >>>>>> >>>>>>> https://issues.apache.org/jira/browse/HBASE-8691 >>>>>>> >>>>>>> >>>>>>> On 6/4/13 6:11 PM, "Sandy Pratt" <prat...@adobe.com> wrote: >>>>>>> >>>>>>>> Haven't had a chance to write a JIRA yet, but I thought I'd pop in >>> here >>>>>>>> with an update in the meantime. >>>>>>>> >>>>>>>> I tried a number of different approaches to eliminate latency and >>>>>>>> "bubbles" in the scan pipeline, and eventually arrived at adding a >>>>>>>> streaming scan API to the region server, along with refactoring the >>>>>>> scan >>>>>>>> interface into an event-drive message receiver interface. In so >>>>>>> doing, I >>>>>>>> was able to take scan speed on my cluster from 59,537 records/sec >>> with >>>>>>> the >>>>>>>> classic scanner to 222,703 records per second with my new scan API. >>>>>>>> Needless to say, I'm pleased ;) >>>>>>>> >>>>>>>> More details forthcoming when I get a chance. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Sandy >>>>>>>> >>>>>>>> On 5/23/13 3:47 PM, "Ted Yu" <yuzhih...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Thanks for the update, Sandy. >>>>>>>>> >>>>>>>>> If you can open a JIRA and attach your producer / consumer scanner >>>>>>> there, >>>>>>>>> that would be great. >>>>>>>>> >>>>>>>>> On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt <prat...@adobe.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I wrote myself a Scanner wrapper that uses a producer/consumer >>>>>>> queue to >>>>>>>>>> keep the client fed with a full buffer as much as possible. When >>>>>>>>>> scanning >>>>>>>>>> my table with scanner caching at 100 records, I see about a 24% >>>>>>> uplift >>>>>>>>>> in >>>>>>>>>> performance (~35k records/sec with the ClientScanner and ~44k >>>>>>>>>> records/sec >>>>>>>>>> with my P/C scanner). However, when I set scanner caching to >> 5000, >>>>>>>>>> it's >>>>>>>>>> more of a wash compared to the standard ClientScanner: ~53k >>>>>>> records/sec >>>>>>>>>> with the ClientScanner and ~60k records/sec with the P/C scanner. >>>>>>>>>> >>>>>>>>>> I'm not sure what to make of those results. I think next I'll >> shut >>>>>>>>>> down >>>>>>>>>> HBase and read the HFiles directly, to see if there's a drop off >> in >>>>>>>>>> performance between reading them directly vs. via the >> RegionServer. >>>>>>>>>> >>>>>>>>>> I still think that to really solve this there needs to be sliding >>>>>>>>>> window >>>>>>>>>> of records in flight between disk and RS, and between RS and >>> client. >>>>>>>>>> I'm >>>>>>>>>> thinking there's probably a single batch of records in flight >>>>>>> between >>>>>>>>>> RS >>>>>>>>>> and client at the moment. >>>>>>>>>> >>>>>>>>>> Sandy >>>>>>>>>> >>>>>>>>>> On 5/23/13 8:45 AM, "Bryan Keller" <brya...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I am considering scanning a snapshot instead of the table. I >>>>>>> believe >>>>>>>>>> this >>>>>>>>>>> is what the ExportSnapshot class does. If I could use the >> scanning >>>>>>>>>> code >>>>>>>>>>> from ExportSnapshot then I will be able to scan the HDFS files >>>>>>>>>> directly >>>>>>>>>>> and bypass the regionservers. This could potentially give me a >>> huge >>>>>>>>>> boost >>>>>>>>>>> in performance for full table scans. However, it doesn't really >>>>>>>>>> address >>>>>>>>>>> the poor scan performance against a table. >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>> >>> >>> >> >>