When exactly is combiner invoked?

2010-01-27 Thread Le Zhao
Hi - combiner performs on a chunk of mapper output data, but what exactly is the chunk cut off, or when exactly will the chunk be fed to the combiner? 1. Will it be after the mapper finishes processing an input record? 2. Will it be after the mapper outputs a key value pair that hits the memor

Re: When exactly is combiner invoked?

2010-01-27 Thread Gang Luo
on disk will be merged, combiners will also be invoked at this time. -Gang - 原始邮件 发件人: Le Zhao 收件人: common-user@hadoop.apache.org 发送日期: 2010/1/27 (周三) 11:57:08 上午 主 题: When exactly is combiner invoked? Hi - combiner performs on a chunk of mapper output data, but what exactly is

Re: When exactly is combiner invoked?

2010-01-27 Thread Jeff Eastman
t; merged, combiners will also be invoked at this time. > > -Gang > > > > - 原始邮件 > 发件人: Le Zhao > 收件人: common-user@hadoop.apache.org > 发送日期: 2010/1/27 (周三) 11:57:08 上午 > 主 题: When exactly is combiner invoked? > > Hi - combiner performs on a chunk of map

Re: When exactly is combiner invoked?

2010-01-27 Thread Amogh Vasekar
Hi, To elaborate a little on Gang's point, the buffer threshold is limited by io.sort.spill.percent, during which spills are created. If the number of spills is more than min.num.spills.for.combine, combiner gets invoked on the spills created before writing to disk. I'm not sure what exactly you

Re: When exactly is combiner invoked?

2010-01-27 Thread Le Zhao
Gang, Jeff and Amogh, Thanks for all the replies. It seems no matter how many times internally combiners are invoked, the output for one specific map task will be *totally* partitioned and combined. Then, the data is shuffled/sent to reducers. That's good to know, because if combining isn't

Re: When exactly is combiner invoked?

2010-01-28 Thread Gang Luo
k, there is no chance for it to be combined with the previous part. -Gang - 原始邮件 发件人: Le Zhao 收件人: common-user@hadoop.apache.org 发送日期: 2010/1/27 (周三) 5:23:51 下午 主 题: Re: When exactly is combiner invoked? Gang, Jeff and Amogh, Thanks for all the replies. It seems no matter