Thanks Amogh. I solve it now. The reason is exactly what you said, I didn't 
consume all the records in the reducer. I break the loop when meet certain 
record. In this case, the rest records which I ignore will not be counted. So, 
there is no problem at all!

 
-Gang



----- 原始邮件 ----
发件人: Amogh Vasekar <am...@yahoo-inc.com>
收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
发送日期: 2009/12/15 (周二) 1:59:14 上午
主   题: Re: Re: Re: Re: Re:   map output not euqal to reduce input


>>how do you define 'consumed by reducer'
Trivially, as long as you have your values iterator go to the end, you should 
be just fine.
Sorry, haven’t worked with decision support per se, probably someone else can 
shed some light on its quirks :)

Amogh

On 12/11/09 7:38 PM, "Gang Luo" <lgpub...@yahoo.com.cn> wrote:

Thanks, Amogn.
I am not sure whether all the records mepper generate are consumed by reducer. 
But how do you define 'consumed by reducer'? I can set a counter to see how 
many lines go to my map function, but this is likely the same as reduce input # 
which is less than map output #.

I didn't use SkipBadRecords class. I think by default the feature is disabled. 
So, it should have nothing to do with this.

I do my test using tables of TPC-DS. If I run my job on some 'toy tables' I 
make, the statistics is correct.



-Gang



----- 原始邮件 ----
发件人: Amogh Vasekar <am...@yahoo-inc.com>
收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
发送日期: 2009/12/11 (周五) 2:55:12 上午
主   题: Re: Re: Re: Re:  map output not euqal to reduce input

Hi,
The counters are updated as the records are *consumed*, for both mapper and 
reducer. Can you confirm if all the values returned by your iterators are 
consumed on reduce side? Also, are you having feature of skipping bad records 
switched on?

Amogh


On 12/11/09 4:32 AM, "Gang Luo" <lgpub...@yahoo.com.cn> wrote:

In the mapper of this job, I get something I am interested in for each
line and then output all of them. So the number of map input records is
equal to the map output records. Actually, I am doing semi join in this
job. There is no failure during execution.

-Gang



----- ԭʼ�ʼ� ----
�����ˣ� Todd Lipcon <t...@cloudera.com>
�ռ��ˣ� common-user@hadoop.apache.org
�������ڣ� 2009/12/10 (����) 4:43:52 ����
��   �⣺ Re: Re�� Re�� map output not euqal to reduce input

On Thu, Dec 10, 2009 at 1:15 PM, Gang Luo <lgpub...@yahoo.com.cn> wrote:
> Hi Todd,
> I didn't change the partitioner, just use the default one. Will the default 
> partitioner cause the lost of the records?
>
> -Gang
>

Do the maps output data nondeterministically? Did you experience any
task failures in the run of the job?

-Todd



      ___________________________________________________________
  ����ؿ����㷢������ؿ�ȫ�����ߣ�
http://card.mail.cn.yahoo.com/ 


      ___________________________________________________________
  好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/ 


      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Reply via email to