Avro Java's file writer[1] (the last several versions) rewinds its buffer if
there is an exception during writing, so if there are writes afterwords the
file will not be corrupt.  However, most tools are not so careful.

[1] DataFileWriter.append()
http://svn.apache.org/repos/asf/avro/trunk/lang/java/avro/src/main/java/org/
apache/avro/file/DataFileWriter.java


On 3/23/12 8:27 PM, "Russell Jurney" <russell.jur...@gmail.com> wrote:

> Ok, now I have a followup question...
> 
> how does one recover from an exception writing an Avro?  The incomplete record
> is being written, which is crashing the reader.
> 
> On Fri, Mar 23, 2012 at 8:01 PM, Russell Jurney <russell.jur...@gmail.com>
> wrote:
>> Thanks Scott, looking at the raw data it seems to have been a truncated
>> record due to UTF problems.
>> 
>> Russell Jurney http://datasyndrome.com
>> 
>> On Mar 23, 2012, at 7:59 PM, Scott Carey <scottca...@apache.org> wrote:
>> 
>>> 
>>> It appears to be reading a union index and failing in there somehow.  If it
>>> did not have any of the pig AvroStorage stuff in there I could tell you
>>> more.
>>> 
>>> What does avro-tools.jar 's 'tojson' tool do?  (java ­jar
>>> avro-tools-1.6.3.jar tojson <file> | your_favorite_text_reader)
>>> What version of Avro is the java stack trace below?
>>> 
>>> 
>>> On 3/23/12 7:01 PM, "Russell Jurney" <russell.jur...@gmail.com> wrote:
>>> 
>>>> I have a problem record I've written in Avro that crashes anything which
>>>> tries to read it :(
>>>> 
>>>> Can anyone make sense of these errors?
>>>> 
>>>> The exception in Pig/AvroStorage is this:
>>>> 
>>>>> java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 64
>>>>> at 
>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java
>>>>> :275)
>>>>> at 
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
>>>>> er.nextKeyValue(PigRecordReader.java:187)
>>>>> at 
>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
>>>>> ask.java:532)
>>>>> at 
>>>>> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364)
>>>>> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>>> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>>> at 
>>>>> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>>> at 
>>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14
>>>>> 2)
>>>>> at 
>>>>> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvr
>>>>> oDatumReader.java:67)
>>>>> at 
>>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>>>>> 8)
>>>>> at 
>>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>>>>> 9)
>>>>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>>>> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>>>> at 
>>>>> org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(
>>>>> PigAvroRecordReader.java:80)
>>>>> at 
>>>>> org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java
>>>>> :273)
>>>>> ... 7 more
>>>> 
>>>> When reading the record in Python:
>>>> 
>>>>> File "/me/Collecting-Data/src/python/cat_avro", line 21, in <module>
>>>>>     for record in df_reader:
>>>>>   File 
>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
>>>>> /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/datafile.py", line 354,
>>>>> in next
>>>>>     datum = self.datum_reader.read(self.datum_decoder)
>>>>>   File 
>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
>>>>> /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 445, in
>>>>> read
>>>>>     return self.read_data(self.writers_schema, self.readers_schema,
>>>>> decoder)
>>>>>   File 
>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
>>>>> /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 490, in
>>>>> read_data
>>>>>     return self.read_record(writers_schema, readers_schema, decoder)
>>>>>   File 
>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
>>>>> /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 690, in
>>>>> read_record
>>>>>     field_val = self.read_data(field.type, readers_field.type, decoder)
>>>>>   File 
>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
>>>>> /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 488, in
>>>>> read_data
>>>>>     return self.read_union(writers_schema, readers_schema, decoder)
>>>>>   File 
>>>>> "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6
>>>>> /site-packages/avro-_AVRO_VERSION_-py2.6.egg/avro/io.py", line 650, in
>>>>> read_union
>>>>>     raise SchemaResolutionException(fail_msg, writers_schema,
>>>>> readers_schema)
>>>>> avro.io.SchemaResolutionException: Can't access branch index 64 for union
>>>>> with 2 branches
>>>> 
>>>> When reading the record in Ruby:
>>>> 
>>>>> /Users/peyomp/.rvm/gems/ruby-1.8.7-p352/gems/avro-1.6.1/lib/avro/io.rb:298
>>>>> :in `read_data': Writer's schema  and Reader's schema ["string","null"] do
>>>>> not match. (Avro::IO::SchemaMatchException)
>>>> 
>>>> -- 
>>>> Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>
>>>> russell.jur...@gmail.com <mailto:russell.jur...@gmail.com>
>>>> datasyndrome.com <http://datasyndrome.com/>
> 
> 
> 
> -- 
> Russell Jurney twitter.com/rjurney <http://twitter.com/rjurney>
> russell.jur...@gmail.com <mailto:russell.jur...@gmail.com>  datasyndrome.com
> <http://datasyndrome.com/>


Reply via email to