[ https://issues.apache.org/jira/browse/AVRO-764?page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel\#worklog-{worklog.getId()} ]
Harsh J Chouraria logged work on AVRO-764: ------------------------------------------ Author: Harsh J Chouraria Created on: 15/Feb/11 3:50 AM Start Date: 15/Feb/11 3:50 AM Worklog Time Spent: 5m Issue Time Tracking ------------------- Worklog Id: (was: 11263) Time Spent: 5m Remaining Estimate: 0h > Possible issue with BinaryData.compare(...) used in Map/Reduce > -------------------------------------------------------------- > > Key: AVRO-764 > URL: https://issues.apache.org/jira/browse/AVRO-764 > Project: Avro > Issue Type: Bug > Components: java > Affects Versions: 1.4.1 > Environment: Linux, CDH3 > Reporter: Harsh J Chouraria > Assignee: Harsh J Chouraria > Priority: Critical > Labels: hadoop > Fix For: 1.5.0 > > Attachments: avro.mapred.binarydata.compare.r1.diff, > avro.mapred.binarydata.compare.r2.diff > > Time Spent: 5m > Remaining Estimate: 0h > > MapReduce's RawComparator feature has a call {{compare(b1, start1, length1, > b2, start2, length2)}} which is handled for Avro using > {{BinaryData.compare(b1, start1, b2, start2, schema)}} > BinaryDecoder, used by the BinaryData.compare(b1, start1, b2, start2, schema) > utility, has a sub-clause that handles byte array inputs of less than 16 > bytes in length. > This is the exact code which am talking about: > {code:title=BinaryDecoder.java Lines 879-893|borderStyle=solid} > private ByteArrayByteSource(byte[] data, int start, int len) { > super(); > // make sure data is not too small, otherwise getLong may try and > // read 10 bytes and get index out of bounds. > if (data.length < 16 || len < 16) { > this.data = new byte[16]; > System.arraycopy(data, start, this.data, 0, len); > this.position = 0; > this.max = len; > } else { > // use the array passed in > this.data = data; > this.position = start; > this.max = start + len; > } > } > {code} > This clause would fail during {{arrayCopy}} since the target {{len}} bytes is > sent into the constructor as the {{length}} of the byte array input itself, > and not the length as given by the framework which should be the right one > [Since the BD.compare() call does not take it as a parameter]. Thus, > arrayCopy would be trying to copy from {{start}} byte index to {{length}} > bytes after it, where {{length}} is the byte array's {{.length}} itself -- > which leads to an index bounds exception. > Here is a slightly low level test case that should fail with an > {{ArrayIndexOutOfBoundsException}} due to a bad {{System.arrayCopy}} call: > {code:title=TestBinaryData.java|borderStyle=solid} > package org.apache.avro.io; > import java.io.ByteArrayOutputStream; > import java.io.IOException; > import java.util.ArrayList; > import java.util.List; > import org.apache.avro.Schema; > import org.apache.avro.Schema.Field; > import org.apache.avro.Schema.Type; > import org.apache.avro.generic.GenericData.Record; > import org.apache.avro.generic.GenericDatumWriter; > import org.junit.Assert; > import org.junit.Test; > public class TestBinaryData { > @Test > public void testCompare() { > // Prepare a schema for testing. > Field integerField = new Field("test", Schema.create(Type.INT), null, > null); > List<Field> fields = new ArrayList<Field>(); > fields.add(integerField); > Schema record = Schema.createRecord("test", null, null, false); > record.setFields(fields); > > ByteArrayOutputStream b1 = new ByteArrayOutputStream(5); > ByteArrayOutputStream b2 = new ByteArrayOutputStream(5); > BinaryEncoder b1Enc = new BinaryEncoder(b1); > BinaryEncoder b2Enc = new BinaryEncoder(b2); > Record testDatum1 = new Record(record); > testDatum1.put(0, 1); > Record testDatum2 = new Record(record); > testDatum2.put(0, 2); > GenericDatumWriter<Record> gWriter = new > GenericDatumWriter<Record>(record); > Integer start1 = 0, start2 = 0; > try { > gWriter.write(testDatum1, b1Enc); > b1Enc.flush(); > start1 = b1.size(); > gWriter.write(testDatum1, b1Enc); > b1Enc.flush(); > b1.close(); > gWriter.write(testDatum2, b2Enc); > b2Enc.flush(); > start2 = b2.size(); > gWriter.write(testDatum2, b2Enc); > b2Enc.flush(); > b2.close(); > BinaryData.compare(b1.toByteArray(), start1, b2.toByteArray(), start2, > record); > } catch (IOException e) { > Assert.fail("IOException while writing records to output stream."); > } > } > } > {code} > A solution would be to let the length, as given by the MapReduce framework > itself, be used instead of using {{bytearray.length}} in its place. I'll > attach a patch for this soon. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira