Andrew, Something looks odd in this stack trace:
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord > at org.apache.avro.generic.GenericData.getField(GenericData.java:525) > at org.apache.avro.generic.GenericData.getField(GenericData.java:540) > at > org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order to extract values from a tuple. Thus, I would expect that the third method invocation be PigAvroDatumWriter.writeRecord. Perhaps, someone else has more insight as to why it's not getting invoked. In the meantime, please confirm that both PigAvroDatumWriter and GenericDatumWriter are loaded from the right jar files. (You can do this by temporarily changing the pig script to invoke JVM with 'java -verbose' and 'grep' the output for these classes.) Best, stan On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy <adwkenwor...@yahoo.com> wrote: > Hi Stan, > > here's the full stacktrace: > > org.apache.avro.file.DataFileWriter$AppendWriteException: > java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be > cast to org.apache.avro.generic.IndexedRecord > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) > at > org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) > at > org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple > cannot be cast to org.apache.avro.generic.IndexedRecord > at org.apache.avro.generic.GenericData.getField(GenericData.java:525) > at org.apache.avro.generic.GenericData.getField(GenericData.java:540) > at > org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) > at > org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) > at > org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) > at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255) > ... 18 more > > > Andrew > > > >>________________________________ >> From: Stan Rosenberg <srosenb...@proclivitysystems.com> >>To: user@pig.apache.org; Andrew Kenworthy <adwkenwor...@yahoo.com> >>Sent: Monday, January 9, 2012 5:30 PM >>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 >> >>Andrew, >> >>The source of the problem may be AvroStorage in piggybank. Could you >>please include the entire stack trace? >> >>stan >> >>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <adwkenwor...@yahoo.com> >>wrote: >>> Hallo, >>> >>> When I run a simple pig script to LOAD and STORE avro data, I get:- >>> >>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be >>> cast to org.apache.avro.generic.IndexedRecord >>> >>> >>> Script: >>> >>> REGISTER /tmp/avro-1.6.0.jar; >>> --REGISTER /tmp/avro-1.5.4.jar >>> --REGISTER /tmp/avro-1.4.1.jar; >>> >>> REGISTER /tmp/piggybank-0.9.1.jar; >>> REGISTER /tmp/json-simple-1.1.jar; >>> REGISTER /tmp/jackson-core-asl-1.8.4.jar; >>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; >>> >>> avroData=LOAD '$DATA_INPUTDIR' USING >>> org.apache.pig.piggybank.storage.avro.AvroStorage(); >>> >>> dataSubset = FOREACH avroData GENERATE myField1, myField2; >>> describe dataSubset; >>> ----------------------------------------------- >>> -- shows: >>> -- dataSubset : {myField1: int,myField2: int} >>> ----------------------------------------------- >>> STORE dataSubset INTO '$OUTPUTDIR' USING >>> org.apache.pig.piggybank.storage.avro.AvroStorage(); >>> >>> If I use the 1.5.4 jar I get the same error, but the script works with the >>> 1.4.1 version. If I just write one field, then it works with 1.6.0. >>> >>> I see there's been a related issue fixed here: >>> >>> https://issues.apache.org/jira/browse/PIG-2202 >>> https://issues.apache.org/jira/browse/PIG-2195 >>> >>> Can anyone confirm that this or similar works with avro 1.6.0, and/or point >>> me in the right direction concering where the problem may lie? >>> >>> Many thanks, >>> >>> Andrew >> >> >>