Re: Poor HBase map-reduce scan performance

2013-07-01 Thread lars hofhansl
Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet to be addressed. I think

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Enis Söztutar
On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl la...@apache.org wrote: Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Bryan Keller
...@apache.org wrote: Absolutely. - Original Message - From: Ted Yu yuzhih...@gmail.com To: user@hbase.apache.org Cc: Sent: Sunday, June 30, 2013 9:32 PM Subject: Re: Poor HBase map-reduce scan performance Looking at the tail of HBASE-8369, there were some comments which are yet

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Bryan Keller
brya...@gmail.com To: user@hbase.apache.org; lars hofhansl la...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Ted Yu
...@apache.org Cc: Sent: Tuesday, June 25, 2013 1:56 AM Subject: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm

Re: Poor HBase map-reduce scan performance

2013-06-28 Thread lars hofhansl
: Re: Poor HBase map-reduce scan performance I tweaked Enis's snapshot input format and backported it to 0.94.6 and have snapshot scanning functional on my system. Performance is dramatically better, as expected i suppose. I'm seeing about 3.6x faster performance vs TableInputFormat. Also, HBase

Re: Poor HBase map-reduce scan performance

2013-06-25 Thread Bryan Keller
To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
https://issues.apache.org/jira/browse/HBASE-8691 On 6/4/13 6:11 PM, Sandy Pratt prat...@adobe.com wrote: Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and bubbles in the scan

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread yonghu
Can anyone explain why client + rpc + server will decrease the performance of scanning? I mean the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. So, in my understanding, there will be no rpc cost. Thanks! Yong On Wed, Jun 5, 2013 at 10:09 AM,

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Ted Yu
bq. the Regionserver and Tasktracker are the same node when you use MapReduce to scan the HBase table. The scan performed by the Tasktracker on that node would very likely access data hosted by region server on other node(s). So there would be RPC involved. There is some discussion on providing

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1. That means there's only one packet in flight at a time, and total throughput is a fraction of what it could be. That's effectively what happens with RPC. The server sends a batch, then

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread yonghu
Dear Sandy, Thanks for your explanation. However, what I don't get is your term client, is this client means MapReduce jobs? If I understand you right, this means Map function will process the tuples and during this processing time, the regionserver did nothing? regards! Yong On Wed, Jun 5,

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread Sandy Pratt
That's my understanding of how the current scan API works, yes. The client calls next() to fetch a batch. While it's waiting for the response from the server, it blocks. After the server responds to the next() call, it does nothing for that scanner until the following next() call. That makes

Re: Poor HBase map-reduce scan performance

2013-06-05 Thread lars hofhansl
From: Sandy Pratt prat...@adobe.com To: user@hbase.apache.org user@hbase.apache.org Sent: Wednesday, June 5, 2013 10:58 AM Subject: Re: Poor HBase map-reduce scan performance Yong, As a thought experiment, imagine how it impacts the throughput of TCP to keep the window size at 1

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Bryan Keller
you to 25% of the theoretical cluster wide maximum disk throughput? -- Lars - Original Message - From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Cc: Sent: Friday, May 10, 2013 8:46 AM Subject: Re: Poor HBase map-reduce scan performance FYI, I ran tests

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Sandy Pratt
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here with an update in the meantime. I tried a number of different approaches to eliminate latency and bubbles in the scan pipeline, and eventually arrived at adding a streaming scan API to the region server, along with

Re: Poor HBase map-reduce scan performance

2013-05-29 Thread Enis Söztutar
From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'm

Re: Poor HBase map-reduce scan performance

2013-05-24 Thread lars hofhansl
:46 AM Subject: Re: Poor HBase map-reduce scan performance FYI, I ran tests with compression on and off. With a plain HDFS sequence file and compression off, I am getting very good I/O numbers, roughly 75% of theoretical max for reads. With snappy compression on with a sequence file, I/O speed

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Bryan Keller
3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'm going to rerun before drawing any conclusions. On May 3, 2013, at 12:17 AM, Bryan Keller brya...@gmail.com wrote

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Sandy Pratt
I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a full buffer as much as possible. When scanning my table with scanner caching at 100 records, I see about a 24% uplift in performance (~35k records/sec with the ClientScanner and ~44k records/sec

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Ted Yu
Thanks for the update, Sandy. If you can open a JIRA and attach your producer / consumer scanner there, that would be great. On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote: I wrote myself a Scanner wrapper that uses a producer/consumer queue to keep the client fed with a

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Sandy Pratt
disk (and they should all be the same size, thus allocation should be cheap). -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Thursday, May 2, 2013 10:54 AM Subject: Re: Poor HBase map-reduce scan performance I ran one of my

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Ted Yu
3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'm going to rerun before drawing any conclusions. On May 3, 2013, at 12:17 AM, Bryan Keller brya...@gmail.com wrote

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Sandy Pratt
From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'm going

Re: Poor HBase map-reduce scan performance

2013-05-22 Thread Ted Yu
profiling myself if there is an easy way to generate data of similar shape. -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance

Re: Poor HBase map-reduce scan performance

2013-05-10 Thread Bryan Keller
From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'm going to rerun

Re: Poor HBase map-reduce scan performance

2013-05-08 Thread Bryan Keller
. -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related

Re: Poor HBase map-reduce scan performance

2013-05-05 Thread Michael Segel
Subject: Re: Poor HBase map-reduce scan performance I ran one of my regionservers through VisualVM. It looks like the top hot spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears at first glance that memory allocations may be an issue. Decompression was next

Re: Poor HBase map-reduce scan performance

2013-05-04 Thread lars hofhansl
. -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Friday, May 3, 2013 3:44 AM Subject: Re: Poor HBase map-reduce scan performance Actually I'm not too confident in my results re block size, they may have been related to major compaction

Re: Poor HBase map-reduce scan performance

2013-05-03 Thread Bryan Keller
, 2013 10:54 AM Subject: Re: Poor HBase map-reduce scan performance I ran one of my regionservers through VisualVM. It looks like the top hot spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears at first glance that memory allocations may be an issue

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread Bryan Keller
...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Wednesday, May 1, 2013 6:01 PM Subject: Re: Poor HBase map-reduce scan performance I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling the regionserver and trying some

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread Nicolas Liochon
To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Wednesday, May 1, 2013 6:01 PM Subject: Re: Poor HBase map-reduce scan performance I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling the regionserver and trying some

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread lars hofhansl
brya...@gmail.com To: user@hbase.apache.org Sent: Thursday, May 2, 2013 10:54 AM Subject: Re: Poor HBase map-reduce scan performance I ran one of my regionservers through VisualVM. It looks like the top hot spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
permits, but I do not have any machines with SSDs). -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 9:31 PM Subject: Re: Poor HBase map-reduce scan performance Yes, I have tried various settings

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread lars hofhansl
the next days as my day job permits, but I do not have any machines with SSDs). -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 9:31 PM Subject: Re: Poor HBase map-reduce scan performance Yes

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Matt Corgan
@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly distributed amongst the regionservers, and load on each regionserver is pretty much the same. I also have per-table balancing turned on. I

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Naidu MS
to pom.xml to do that. -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread ramkrishna vasudevan
From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly distributed amongst the regionservers

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread ramkrishna vasudevan
From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly distributed amongst the regionservers, and load on each

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Jean-Marc Spaggiari
Cloudera's version of Hadoop. I can send along a simple patch to pom.xml to do that. -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Michael Segel
...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 9:31 PM Subject: Re: Poor HBase map-reduce scan performance Yes, I have tried various settings for setCaching() and I have setCacheBlocks(false) On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote: From http

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
From: Bryan Keller brya...@gmail.com javascript:; To: user@hbase.apache.org javascript:; Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly distributed amongst

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly distributed amongst the regionservers, and load on each regionserver is pretty much the same. I also have per-table balancing turned on. I get

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 11:02 PM Subject: Re: Poor HBase map-reduce scan performance The table has hashed keys so rows are evenly distributed amongst the regionservers, and load on each regionserver

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread lars hofhansl
: Re: Poor HBase map-reduce scan performance I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling the regionserver and trying some other things tonight and tomorrow and will report back. On May 1, 2013, at 8:00 AM, Bryan Keller brya

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
- From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org user@hbase.apache.org Cc: Sent: Wednesday, May 1, 2013 6:01 PM Subject: Re: Poor HBase map-reduce scan performance I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling

Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
I have been attempting to speed up my HBase map-reduce scans for a while now. I have tried just about everything without much luck. I'm running out of ideas and was hoping for some suggestions. This is HBase 0.94.2 and Hadoop 2.0.0 (CDH4.2.1). The table I'm scanning: 20 mil rows Hundreds of

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Ted Yu
From http://hbase.apache.org/book.html#mapreduce.example : scan.setCaching(500);// 1 is the default in Scan, which will be bad for MapReduce jobs scan.setCacheBlocks(false); // don't set to true for MR jobs I guess you have used the above setting. 0.94.x releases are compatible. Have

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
Yes, I have tried various settings for setCaching() and I have setCacheBlocks(false) On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote: From http://hbase.apache.org/book.html#mapreduce.example : scan.setCaching(500);// 1 is the default in Scan, which will be bad for

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Ted Yu
Have you tried enabling short circuit read ? Thanks On Apr 30, 2013, at 9:31 PM, Bryan Keller brya...@gmail.com wrote: Yes, I have tried various settings for setCaching() and I have setCacheBlocks(false) On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote: From

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
Yes, I have it enabled (forgot to mention that). On Apr 30, 2013, at 9:56 PM, Ted Yu yuzhih...@gmail.com wrote: Have you tried enabling short circuit read ? Thanks On Apr 30, 2013, at 9:31 PM, Bryan Keller brya...@gmail.com wrote: Yes, I have tried various settings for setCaching() and

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread lars hofhansl
try to do a bit of profiling during the next days as my day job permits, but I do not have any machines with SSDs). -- Lars From: Bryan Keller brya...@gmail.com To: user@hbase.apache.org Sent: Tuesday, April 30, 2013 9:31 PM Subject: Re: Poor HBase map-reduce