[ 
https://issues.apache.org/jira/browse/AVRO-764?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel\#worklog-{worklog.getId()}
 ]

Harsh J Chouraria logged work on AVRO-764:
------------------------------------------

                Author: Harsh J Chouraria
            Created on: 15/Feb/11 3:50 AM
            Start Date: 15/Feb/11 3:50 AM
    Worklog Time Spent: 5m 

Issue Time Tracking
-------------------

            Worklog Id:     (was: 11263)
            Time Spent: 5m
    Remaining Estimate: 0h

> Possible issue with BinaryData.compare(...) used in Map/Reduce
> --------------------------------------------------------------
>
>                 Key: AVRO-764
>                 URL: https://issues.apache.org/jira/browse/AVRO-764
>             Project: Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.4.1
>         Environment: Linux, CDH3
>            Reporter: Harsh J Chouraria
>            Assignee: Harsh J Chouraria
>            Priority: Critical
>              Labels: hadoop
>             Fix For: 1.5.0
>
>         Attachments: avro.mapred.binarydata.compare.r1.diff, 
> avro.mapred.binarydata.compare.r2.diff
>
>          Time Spent: 5m
>  Remaining Estimate: 0h
>
> MapReduce's RawComparator feature has a call {{compare(b1, start1, length1, 
> b2, start2, length2)}} which is handled for Avro using 
> {{BinaryData.compare(b1, start1, b2, start2, schema)}}
> BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema) 
> utility, has a sub-clause that handles byte array inputs of less than 16 
> bytes in length.
> This is the exact code which am talking about:
> {code:title=BinaryDecoder.java Lines 879-893|borderStyle=solid}
>     private ByteArrayByteSource(byte[] data, int start, int len) {
>       super();
>       // make sure data is not too small, otherwise getLong may try and
>       // read 10 bytes and get index out of bounds.
>       if (data.length < 16 || len < 16) {
>         this.data = new byte[16];
>         System.arraycopy(data, start, this.data, 0, len);
>         this.position = 0;
>         this.max = len;
>       } else {
>         // use the array passed in
>         this.data = data;
>         this.position = start;
>         this.max = start + len;
>       }
>     }
> {code}
> This clause would fail during {{arrayCopy}} since the target {{len}} bytes is 
> sent into the constructor as the {{length}} of the byte array input itself, 
> and not the length as given by the framework which should be the right one 
> [Since the BD.compare() call does not take it as a parameter]. Thus, 
> arrayCopy would be trying to copy from {{start}} byte index to {{length}} 
> bytes after it, where {{length}} is the byte array's {{.length}} itself -- 
> which leads to an index bounds exception.
> Here is a slightly low level test case that should fail with an 
> {{ArrayIndexOutOfBoundsException}} due to a bad {{System.arrayCopy}} call:
> {code:title=TestBinaryData.java|borderStyle=solid}
> package org.apache.avro.io;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.avro.Schema;
> import org.apache.avro.Schema.Field;
> import org.apache.avro.Schema.Type;
> import org.apache.avro.generic.GenericData.Record;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestBinaryData {
>   @Test
>   public void testCompare() {
>     // Prepare a schema for testing.
>     Field integerField = new Field("test", Schema.create(Type.INT), null, 
> null);
>     List<Field> fields = new ArrayList<Field>();
>     fields.add(integerField);
>     Schema record = Schema.createRecord("test", null, null, false);
>     record.setFields(fields);
>     
>     ByteArrayOutputStream b1 = new ByteArrayOutputStream(5);
>     ByteArrayOutputStream b2 = new ByteArrayOutputStream(5);
>     BinaryEncoder b1Enc = new BinaryEncoder(b1);
>     BinaryEncoder b2Enc = new BinaryEncoder(b2);
>     Record testDatum1 = new Record(record);
>     testDatum1.put(0, 1);
>     Record testDatum2 = new Record(record);
>     testDatum2.put(0, 2);
>     GenericDatumWriter<Record> gWriter = new 
> GenericDatumWriter<Record>(record);
>     Integer start1 = 0, start2 = 0;
>     try {
>       gWriter.write(testDatum1, b1Enc);
>       b1Enc.flush();
>       start1 = b1.size();
>       gWriter.write(testDatum1, b1Enc);
>       b1Enc.flush();
>       b1.close();
>       gWriter.write(testDatum2, b2Enc);
>       b2Enc.flush();
>       start2 = b2.size();
>       gWriter.write(testDatum2, b2Enc);
>       b2Enc.flush();
>       b2.close();
>       BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2, 
> record);
>     } catch (IOException e) {
>       Assert.fail("IOException while writing records to output stream.");
>     }
>   }
> }
> {code}
> A solution would be to let the length, as given by the MapReduce framework 
> itself, be used instead of using {{bytearray.length}} in its place. I'll 
> attach a patch for this soon.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to