Absolutely.
- Original Message -
From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org
Cc:
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance
Looking at the tail of HBASE-8369, there were some comments which are yet
to be addressed.
I think
On Mon, Jul 1, 2013 at 3:59 AM, lars hofhansl la...@apache.org wrote:
Absolutely.
- Original Message -
From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org
Cc:
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance
Looking at the tail
...@apache.org wrote:
Absolutely.
- Original Message -
From: Ted Yu yuzhih...@gmail.com
To: user@hbase.apache.org
Cc:
Sent: Sunday, June 30, 2013 9:32 PM
Subject: Re: Poor HBase map-reduce scan performance
Looking at the tail of HBASE-8369, there were some comments which are yet
brya...@gmail.com
To: user@hbase.apache.org; lars hofhansl la...@apache.org
Cc:
Sent: Tuesday, June 25, 2013 1:56 AM
Subject: Re: Poor HBase map-reduce scan performance
I tweaked Enis's snapshot input format and backported it to 0.94.6 and have
snapshot scanning functional on my system
...@apache.org
Cc:
Sent: Tuesday, June 25, 2013 1:56 AM
Subject: Re: Poor HBase map-reduce scan performance
I tweaked Enis's snapshot input format and backported it to 0.94.6 and
have snapshot scanning functional on my system. Performance is dramatically
better, as expected i suppose. I'm
: Re: Poor HBase map-reduce scan performance
I tweaked Enis's snapshot input format and backported it to 0.94.6 and have
snapshot scanning functional on my system. Performance is dramatically better,
as expected i suppose. I'm seeing about 3.6x faster performance vs
TableInputFormat. Also, HBase
To: user@hbase.apache.org user@hbase.apache.org
Sent: Wednesday, June 5, 2013 10:58 AM
Subject: Re: Poor HBase map-reduce scan performance
Yong,
As a thought experiment, imagine how it impacts the throughput of TCP to
keep the window size at 1. That means there's only one packet in flight
https://issues.apache.org/jira/browse/HBASE-8691
On 6/4/13 6:11 PM, Sandy Pratt prat...@adobe.com wrote:
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
with an update in the meantime.
I tried a number of different approaches to eliminate latency and
bubbles in the scan
Can anyone explain why client + rpc + server will decrease the performance
of scanning? I mean the Regionserver and Tasktracker are the same node when
you use MapReduce to scan the HBase table. So, in my understanding, there
will be no rpc cost.
Thanks!
Yong
On Wed, Jun 5, 2013 at 10:09 AM,
bq. the Regionserver and Tasktracker are the same node when you use
MapReduce to scan the HBase table.
The scan performed by the Tasktracker on that node would very likely access
data hosted by region server on other node(s). So there would be RPC
involved.
There is some discussion on providing
Yong,
As a thought experiment, imagine how it impacts the throughput of TCP to
keep the window size at 1. That means there's only one packet in flight
at a time, and total throughput is a fraction of what it could be.
That's effectively what happens with RPC. The server sends a batch, then
Dear Sandy,
Thanks for your explanation.
However, what I don't get is your term client, is this client means
MapReduce jobs? If I understand you right, this means Map function will
process the tuples and during this processing time, the regionserver did
nothing?
regards!
Yong
On Wed, Jun 5,
That's my understanding of how the current scan API works, yes. The
client calls next() to fetch a batch. While it's waiting for the response
from the server, it blocks. After the server responds to the next() call,
it does nothing for that scanner until the following next() call. That
makes
From: Sandy Pratt prat...@adobe.com
To: user@hbase.apache.org user@hbase.apache.org
Sent: Wednesday, June 5, 2013 10:58 AM
Subject: Re: Poor HBase map-reduce scan performance
Yong,
As a thought experiment, imagine how it impacts the throughput of TCP to
keep the window size at 1
you to 25% of the theoretical cluster wide maximum disk
throughput?
-- Lars
- Original Message -
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Cc:
Sent: Friday, May 10, 2013 8:46 AM
Subject: Re: Poor HBase map-reduce scan performance
FYI, I ran tests
Haven't had a chance to write a JIRA yet, but I thought I'd pop in here
with an update in the meantime.
I tried a number of different approaches to eliminate latency and
bubbles in the scan pipeline, and eventually arrived at adding a
streaming scan API to the region server, along with
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may
have been related to major compaction. I'm
:46 AM
Subject: Re: Poor HBase map-reduce scan performance
FYI, I ran tests with compression on and off.
With a plain HDFS sequence file and compression off, I am getting very good I/O
numbers, roughly 75% of theoretical max for reads. With snappy compression on
with a sequence file, I/O speed
3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may
have been related to major compaction. I'm going to rerun before
drawing any conclusions.
On May 3, 2013, at 12:17 AM, Bryan Keller brya...@gmail.com
wrote
I wrote myself a Scanner wrapper that uses a producer/consumer queue to
keep the client fed with a full buffer as much as possible. When scanning
my table with scanner caching at 100 records, I see about a 24% uplift in
performance (~35k records/sec with the ClientScanner and ~44k records/sec
Thanks for the update, Sandy.
If you can open a JIRA and attach your producer / consumer scanner there,
that would be great.
On Thu, May 23, 2013 at 3:42 PM, Sandy Pratt prat...@adobe.com wrote:
I wrote myself a Scanner wrapper that uses a producer/consumer queue to
keep the client fed with a
disk (and they
should all be the same size, thus allocation should be cheap).
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Thursday, May 2, 2013 10:54 AM
Subject: Re: Poor HBase map-reduce scan performance
I ran one of my
3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may
have been related to major compaction. I'm going to rerun before
drawing any conclusions.
On May 3, 2013, at 12:17 AM, Bryan Keller brya...@gmail.com wrote
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may
have been related to major compaction. I'm going
profiling myself if there is an
easy
way to generate data of similar shape.
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may have
been related to major compaction. I'm going to rerun
.
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may have
been related
Subject: Re: Poor HBase map-reduce scan performance
I ran one of my regionservers through VisualVM. It looks like the top hot
spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate().
It appears at first glance that memory allocations may be an issue.
Decompression was next
.
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Friday, May 3, 2013 3:44 AM
Subject: Re: Poor HBase map-reduce scan performance
Actually I'm not too confident in my results re block size, they may have been
related to major compaction
, 2013 10:54 AM
Subject: Re: Poor HBase map-reduce scan performance
I ran one of my regionservers through VisualVM. It looks like the top hot
spots are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It
appears at first glance that memory allocations may be an issue
...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org
Cc:
Sent: Wednesday, May 1, 2013 6:01 PM
Subject: Re: Poor HBase map-reduce scan performance
I tried running my test with 0.94.4, unfortunately performance was about the
same. I'm planning on profiling the regionserver and trying some
To: user@hbase.apache.org user@hbase.apache.org
Cc:
Sent: Wednesday, May 1, 2013 6:01 PM
Subject: Re: Poor HBase map-reduce scan performance
I tried running my test with 0.94.4, unfortunately performance was
about the same. I'm planning on profiling the regionserver and trying some
brya...@gmail.com
To: user@hbase.apache.org
Sent: Thursday, May 2, 2013 10:54 AM
Subject: Re: Poor HBase map-reduce scan performance
I ran one of my regionservers through VisualVM. It looks like the top hot spots
are HFileReaderV2$ScannerV2.getKeyValue() and ByteBuffer.allocate(). It appears
permits, but I do not have any machines with SSDs).
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 9:31 PM
Subject: Re: Poor HBase map-reduce scan performance
Yes, I have tried various settings
the next days as my day job
permits, but I do not have any machines with SSDs).
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 9:31 PM
Subject: Re: Poor HBase map-reduce scan performance
Yes
@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly distributed amongst the
regionservers, and load on each regionserver is pretty much the same. I
also have per-table balancing turned on. I
to pom.xml to do that.
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly distributed amongst the
regionservers
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly distributed amongst the
regionservers, and load on each
Cloudera's version of
Hadoop. I can send along a simple patch to pom.xml to do that.
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 9:31 PM
Subject: Re: Poor HBase map-reduce scan performance
Yes, I have tried various settings for setCaching() and I have
setCacheBlocks(false)
On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote:
From http
From: Bryan Keller brya...@gmail.com javascript:;
To: user@hbase.apache.org javascript:;
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly distributed amongst
@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly distributed amongst the
regionservers, and load on each regionserver is pretty much the same. I
also have per-table balancing turned on. I get
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 11:02 PM
Subject: Re: Poor HBase map-reduce scan performance
The table has hashed keys so rows are evenly distributed amongst the
regionservers, and load on each regionserver
: Re: Poor HBase map-reduce scan performance
I tried running my test with 0.94.4, unfortunately performance was about the
same. I'm planning on profiling the regionserver and trying some other things
tonight and tomorrow and will report back.
On May 1, 2013, at 8:00 AM, Bryan Keller brya
-
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org user@hbase.apache.org
Cc:
Sent: Wednesday, May 1, 2013 6:01 PM
Subject: Re: Poor HBase map-reduce scan performance
I tried running my test with 0.94.4, unfortunately performance was about the
same. I'm planning on profiling
I have been attempting to speed up my HBase map-reduce scans for a while now. I
have tried just about everything without much luck. I'm running out of ideas
and was hoping for some suggestions. This is HBase 0.94.2 and Hadoop 2.0.0
(CDH4.2.1).
The table I'm scanning:
20 mil rows
Hundreds of
From http://hbase.apache.org/book.html#mapreduce.example :
scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
I guess you have used the above setting.
0.94.x releases are compatible. Have
Yes, I have tried various settings for setCaching() and I have
setCacheBlocks(false)
On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote:
From http://hbase.apache.org/book.html#mapreduce.example :
scan.setCaching(500);// 1 is the default in Scan, which will
be bad for
Have you tried enabling short circuit read ?
Thanks
On Apr 30, 2013, at 9:31 PM, Bryan Keller brya...@gmail.com wrote:
Yes, I have tried various settings for setCaching() and I have
setCacheBlocks(false)
On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote:
From
Yes, I have it enabled (forgot to mention that).
On Apr 30, 2013, at 9:56 PM, Ted Yu yuzhih...@gmail.com wrote:
Have you tried enabling short circuit read ?
Thanks
On Apr 30, 2013, at 9:31 PM, Bryan Keller brya...@gmail.com wrote:
Yes, I have tried various settings for setCaching() and
try to do a bit of profiling during the next days as my day job
permits, but I do not have any machines with SSDs).
-- Lars
From: Bryan Keller brya...@gmail.com
To: user@hbase.apache.org
Sent: Tuesday, April 30, 2013 9:31 PM
Subject: Re: Poor HBase map-reduce
52 matches
Mail list logo