One more update: running the job with the -XX:-UseLoopPredicate option gave the same results. The difference between mapper output records and reducer input records is persistent.
Thanks! Vyacheslav On Aug 17, 2011, at 3:56 AM, Scott Carey wrote: > On 8/16/11 3:56 PM, "Vyacheslav Zholudev" <vyacheslav.zholu...@gmail.com> > wrote: > >> Hi, Scott, >> >> thanks for your reply. >> >>> What Avro version is this happening with? What JVM version? >> >> We are using Avro 1.5.1 and Sun JDK 6, but the exact version I will have >> to look up. >> >>> >>> On a hunch, have you tried adding -XX:-UseLoopPredicate to the JVM args >>> if >>> it is Sun and JRE 6u21 or later? (some issues in loop predicates affect >>> Java 6 too, just not as many as the recent news on Java7). >>> >>> Otherwise, it may likely be the same thing as AVRO-782. Any extra >>> information related to that issue would be welcome. >> >> I will have to collect it. In the meanwhile, do you have any reasonable >> explanations of the issue besides it being something like AVRO-782? > > What is your key type (map output schema, first type argument of Pair)? > Is your key a Utf8 or String? I don't have a reasonable explanation at > this point, I haven't looked into it in depth with a good reproducible > case. I have my suspicions with how recycling of the key works since Utf8 > is mutable and its backing byte[] can end up shared. > > > >> >> Thanks a lot, >> Vyacheslav >> >>> >>> Thanks! >>> >>> -Scott >>> >>> >>> >>> On 8/16/11 8:39 AM, "Vyacheslav Zholudev" >>> <vyacheslav.zholu...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I'm having multiple hadoop jobs that use the avro mapred API. >>>> Only in one of the jobs I have a visible mismatch between a number of >>>> map >>>> output records and reducer input records. >>>> >>>> Does anybody encountered such a behavior? Can anybody think of possible >>>> explanations of this phenomenon? >>>> >>>> Any pointers/thoughts are highly appreciated! >>>> >>>> Best, >>>> Vyacheslav >>> >>> >> >> Best, >> Vyacheslav >> >> >> > >