I have a job that is emitting over 3 billion rows from the map to the reduce.
The job is configured with 43 reduce tasks. A perfectly even distribution
would amount to about 70 million rows per reduce task. However I actually got
around 60 million for most of the tasks, one task got over 100
Dave Shine ,
Can you share how many data is been taken by map task .If map task is
uneven then it might be Hot Spotting Problem.
Have an look on
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
.
I had also faced same pr
On 07/20/2012 09:20 AM, Dave Shine wrote:
I believe this is referred to as a “key skew problem”, which I know is
heavily dependent on the actual data being processed. Can anyone point
me to any blog posts, white papers, etc. that might give me some options
on how to deal with this issue?
I don
Hi Dave,
I haven't actually done this in practice, so take this with a grain of
salt ;-)
One way to circumvent your problem might be to add entropy to the keys,
i.e., if your keys are "a", "b" etc. and you got too many "a"s and too
many "b"s, you could inflate your keys randomly to be (a, 1)
On 07/20/2012 09:20 AM, Dave Shine wrote:
I have a job that is emitting over 3 billion rows from the map to the reduce.
The job is configured with 43 reduce tasks. A perfectly even distribution
would amount to about 70 million rows per reduce task. However I actually got
around 60 million f
.ab...@gmail.com]
Sent: Friday, July 20, 2012 9:58 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Distributing Keys across Reducers
Dave Shine ,
Can you share how many data is been taken by map task .If map task is
uneven then it might be Hot Spotting Problem.
Have an look on
http://blog.s
d this problem.
Dave Shine
Sr. Software Engineer
321.939.5093 direct | 407.314.0122 mobile
CI Boost(tm) Clients Outperform Online(tm) www.ciboost.com
-Original Message-
From: John Armstrong [mailto:j...@ccri.com]
Sent: Friday, July 20, 2012 10:20 AM
To: mapreduce-user@hadoop.apache.org
Subject
.0122 mobile
> CI Boost(tm) Clients Outperform Online(tm) www.ciboost.com
>
>
> -Original Message-
> From: John Armstrong [mailto:j...@ccri.com]
> Sent: Friday, July 20, 2012 10:20 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Distributing Keys across Reducers
&g
rmstr...@ccri.com
Subject: Re: Distributing Keys across Reducers
Does applying a combiner make any difference? Or are these numbers with the
combiner included?
On Fri, Jul 20, 2012 at 8:46 PM, Dave Shine
wrote:
> Thanks John.
>
> The key is my own WritableComparable object, and I have
.
From: David Rosenstrauch [dar...@darose.net]
Sent: Friday, July 20, 2012 7:45 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Distributing Keys across Reducers
On 07/20/2012 09:20 AM, Dave Shine wrote:
> I have a job that is emitting over 3 billion rows from the map to the reduce.
> T
...@exar.com]
Sent: Friday, July 20, 2012 1:03 PM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Distributing Keys across Reducers
Just a thought, but can you deal with the problem with increased granularity by
simply making the jobs smaller?
If you have enough jobs, when one takes twice as
20, 2012 1:13 PM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Distributing Keys across Reducers
Yes, that is a possibility, but it will take some significant rearchitecture.
I was assuming that was what I was going to have to do until I saw the key
distribution problem and though I might
> CI BoostT Clients Outperform OnlineT www.ciboost.com
>
>
> -Original Message-
> From: Dave Shine [mailto:dave.sh...@channelintelligence.com]
> Sent: Friday, July 20, 2012 1:13 PM
> To: mapreduce-user@hadoop.apache.org
> Subject: RE: Distributing Keys across
13 matches
Mail list logo