Thanks Amogh. I solve it now. The reason is exactly what you said, I didn't consume all the records in the reducer. I break the loop when meet certain record. In this case, the rest records which I ignore will not be counted. So, there is no problem at all!
-Gang ----- 原始邮件 ---- 发件人: Amogh Vasekar <am...@yahoo-inc.com> 收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> 发送日期: 2009/12/15 (周二) 1:59:14 上午 主 题: Re: Re: Re: Re: Re: map output not euqal to reduce input >>how do you define 'consumed by reducer' Trivially, as long as you have your values iterator go to the end, you should be just fine. Sorry, haven’t worked with decision support per se, probably someone else can shed some light on its quirks :) Amogh On 12/11/09 7:38 PM, "Gang Luo" <lgpub...@yahoo.com.cn> wrote: Thanks, Amogn. I am not sure whether all the records mepper generate are consumed by reducer. But how do you define 'consumed by reducer'? I can set a counter to see how many lines go to my map function, but this is likely the same as reduce input # which is less than map output #. I didn't use SkipBadRecords class. I think by default the feature is disabled. So, it should have nothing to do with this. I do my test using tables of TPC-DS. If I run my job on some 'toy tables' I make, the statistics is correct. -Gang ----- 原始邮件 ---- 发件人: Amogh Vasekar <am...@yahoo-inc.com> 收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org> 发送日期: 2009/12/11 (周五) 2:55:12 上午 主 题: Re: Re: Re: Re: map output not euqal to reduce input Hi, The counters are updated as the records are *consumed*, for both mapper and reducer. Can you confirm if all the values returned by your iterators are consumed on reduce side? Also, are you having feature of skipping bad records switched on? Amogh On 12/11/09 4:32 AM, "Gang Luo" <lgpub...@yahoo.com.cn> wrote: In the mapper of this job, I get something I am interested in for each line and then output all of them. So the number of map input records is equal to the map output records. Actually, I am doing semi join in this job. There is no failure during execution. -Gang ----- ԭʼ�ʼ� ---- �����ˣ� Todd Lipcon <t...@cloudera.com> �ռ��ˣ� common-user@hadoop.apache.org �������ڣ� 2009/12/10 (����) 4:43:52 ���� �� �⣺ Re: Re�� Re�� map output not euqal to reduce input On Thu, Dec 10, 2009 at 1:15 PM, Gang Luo <lgpub...@yahoo.com.cn> wrote: > Hi Todd, > I didn't change the partitioner, just use the default one. Will the default > partitioner cause the lost of the records? > > -Gang > Do the maps output data nondeterministically? Did you experience any task failures in the run of the job? -Todd ___________________________________________________________ ����ؿ����㷢������ؿ�ȫ�����ߣ� http://card.mail.cn.yahoo.com/ ___________________________________________________________ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/ ___________________________________________________________ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/