Sorry I found that my previous message in the archive become all in black.  Let 
me re-explain the problem.  The following piece of code for AvroReducer causes 
problem:
           public void reduce(Utf8 key, Iterable<GenericRecord> values, 
AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException { 
                        GenericRecord record = null;                         
for (GenericRecord value : values) {                                 -- code 
omitted here --                                 record = value;                 
                record.put("rowkey", key);   <=== this statement causes problem 
                                collector.collect(record);                      
   }            }
As explained in my previous message, if I remove the statement 
record.put("rowkey", key), the code works fine, in that the key values pairs 
passed to the routine reduce() are correct.  But if you add this statement, the 
key values pairs passed to the routine reduce() are out of order, something 
like (key1, values1), (key2, values3) rather than (key2, values2).  Some 
details are explained in my previous message.  Is  this problem relating to 
Hadoop binary iterators or Avro deserialization code?  Thanks.
Ey-Chih Chow
From: eyc...@hotmail.com
To: user@avro.apache.org
Subject: is this a bug?
Date: Wed, 2 Mar 2011 13:05:55 -0800








Hi,
I am working on an Avro MR job and encountering an issue with AvroReducer<Utf8, 
GenericRecord, GenericRecord>. The corresponding reduce() routine is 
implemented in the following way:
public void reduce(Utf8 key, Iterable<GenericRecord> values, 
AvroCollector<GenericRecord> collector, Reporter reporter) throws IOException {
                                  .                                  .          
                        .
       GenericRecord record = null;
       for (GenericRecord value : values) {                                   . 
                                  .                                   .         
   record = value;            record.put("rowkey", key);                        
           .                                   .                                
   .            collector.collect(record);         }} 
If I comment out the statement in red in the above code, the reduce function 
gets called properly with CORRECT key values pairs passed to reduce().  
However, if I add the statement in red to the routine, the reduce function is 
called with WRONG key values pairs, in the sense that key2 paired with values3, 
instead of values2, when passed to the reduce() routine.  I traced this problem 
by including Hadoop source code, such as ReduceTask.java, Task.java, and Avro 
source code, such as HadoopReducer.java, HadoopReducerBase.java, and all the 
serialization code.  The problem showed up on the second call of the reduce(), 
but I can not locate the exact place that cause the problem.  My intuition is 
that this is incurred in either the hadoop iterators after merge sort or Avro 
deserialization.  Is there anybody can help me on this?  Thanks.
Ey-Chih Chow                                                              

Reply via email to