Hi - combiner performs on a chunk of mapper output data, but what
exactly is the chunk cut off, or when exactly will the chunk be fed to
the combiner?
1. Will it be after the mapper finishes processing an input record?
2. Will it be after the mapper outputs a key value pair that hits the
memor
on disk will be
merged, combiners will also be invoked at this time.
-Gang
- 原始邮件
发件人: Le Zhao
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/27 (周三) 11:57:08 上午
主 题: When exactly is combiner invoked?
Hi - combiner performs on a chunk of mapper output data, but what exactly is
t; merged, combiners will also be invoked at this time.
>
> -Gang
>
>
>
> - 原始邮件
> 发件人: Le Zhao
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2010/1/27 (周三) 11:57:08 上午
> 主 题: When exactly is combiner invoked?
>
> Hi - combiner performs on a chunk of map
Hi,
To elaborate a little on Gang's point, the buffer threshold is limited by
io.sort.spill.percent, during which spills are created. If the number of spills
is more than min.num.spills.for.combine, combiner gets invoked on the spills
created before writing to disk.
I'm not sure what exactly you
Gang, Jeff and Amogh,
Thanks for all the replies.
It seems no matter how many times internally combiners are invoked, the
output for one specific map task will be *totally* partitioned and
combined. Then, the data is shuffled/sent to reducers.
That's good to know, because if combining isn't
k, there is
no chance for it to be combined with the previous part.
-Gang
- 原始邮件
发件人: Le Zhao
收件人: common-user@hadoop.apache.org
发送日期: 2010/1/27 (周三) 5:23:51 下午
主 题: Re: When exactly is combiner invoked?
Gang, Jeff and Amogh,
Thanks for all the replies.
It seems no matter