But be careful, since combiners may execute "zero or more times"
depending upon mysterious internal logic. Relying upon combiners to do
significant work, as some of the Mahout clustering algorithms used to
do, will bite you.

Jeff


Gang Luo wrote:
> When the map function generate the intermediate result and first sent them to 
> buffer, the partitioning and sorting will start working and , if you specify 
> a combiner, it will be invoked at this time. This process is in parallel with 
> the map function. When map function finishes, all the spills on disk will be 
> merged, combiners will also be invoked at this time. 
>
> -Gang
>
>
>
> ----- 原始邮件 ----
> 发件人: Le Zhao <lez...@cs.cmu.edu>
> 收件人: common-user@hadoop.apache.org
> 发送日期: 2010/1/27 (周三) 11:57:08 上午
> 主   题: When exactly is combiner invoked?
>
> Hi - combiner performs on a chunk of mapper output data, but what exactly is 
> the chunk cut off, or when exactly will the chunk be fed to the combiner?
>
> 1. Will it be after the mapper finishes processing an input record?
> 2. Will it be after the mapper outputs a key value pair that hits the memory 
> limit?
>
> This will be important to know, because strategy 1 gives more guarantee over 
> output record duplicity than 2, say when an input record for the mapper can 
> correspond to multiple output records with the same key.
>
> Thanks,
> Le
>
>
>
>       ___________________________________________________________ 
>   好玩贺卡等你发,邮箱贺卡全新上线! 
> http://card.mail.cn.yahoo.com/
>
>   

Reply via email to