Hi Stan, Thank you for your feedback. I've run the script passing "-D mapred.child.java.opts=-verbose:class" and have the following in my logs:
[Loaded org.apache.avro.generic.GenericDatumWriter from file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar] [Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar] I assume the .../job_201111230039_0146/jars/job.jar is the one prepared by pig using the jars I have REGISTER-ed, in which case the classes are the ones I expect, or have I misread that? Regards, Andrew >________________________________ > From: Stan Rosenberg <srosenb...@proclivitysystems.com> >To: user@pig.apache.org; Andrew Kenworthy <adwkenwor...@yahoo.com> >Sent: Tuesday, January 10, 2012 5:36 PM >Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 > >Andrew, > >Something looks odd in this stack trace: > >Caused by: java.lang.ClassCastException: >org.apache.pig.data.BinSedesTuple cannot be cast to >org.apache.avro.generic.IndexedRecord >> at org.apache.avro.generic.GenericData.getField(GenericData.java:525) >> at org.apache.avro.generic.GenericData.getField(GenericData.java:540) >> at >>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) >> at >>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) >> at >>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) > >PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order >to extract values from a tuple. Thus, I would expect that the third >method invocation be PigAvroDatumWriter.writeRecord. Perhaps, someone >else has more insight as to why it's not getting invoked. In the >meantime, please confirm that both PigAvroDatumWriter and >GenericDatumWriter are loaded from the right jar files. (You can do >this by temporarily changing the pig script to invoke JVM with 'java >-verbose' and 'grep' the output for these classes.) > >Best, > >stan > >On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy ><adwkenwor...@yahoo.com> wrote: >> Hi Stan, >> >> here's the full stacktrace: >> >> org.apache.avro.file.DataFileWriter$AppendWriteException: >> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be >> cast to org.apache.avro.generic.IndexedRecord >> at >> org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261) >> at >> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) >> at >> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) >> at >> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530) >> at >> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) >> at >> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> at org.apache.hadoop.mapred.Child.main(Child.java:262) >> Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple >> cannot be cast to org.apache.avro.generic.IndexedRecord >> at org.apache.avro.generic.GenericData.getField(GenericData.java:525) >> at org.apache.avro.generic.GenericData.getField(GenericData.java:540) >> at >> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103) >> at >> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65) >> at >> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) >> at >> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57) >> at >> org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255) >> ... 18 more >> >> >> Andrew >> >> >> >>>________________________________ >>> From: Stan Rosenberg <srosenb...@proclivitysystems.com> >>>To: user@pig.apache.org; Andrew Kenworthy <adwkenwor...@yahoo.com> >>>Sent: Monday, January 9, 2012 5:30 PM >>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0 >>> >>>Andrew, >>> >>>The source of the problem may be AvroStorage in piggybank. Could you >>>please include the entire stack trace? >>> >>>stan >>> >>>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <adwkenwor...@yahoo.com> >>>wrote: >>>> Hallo, >>>> >>>> When I run a simple pig script to LOAD and STORE avro data, I get:- >>>> >>>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be >>>> cast to org.apache.avro.generic.IndexedRecord >>>> >>>> >>>> Script: >>>> >>>> REGISTER /tmp/avro-1.6.0.jar; >>>> --REGISTER /tmp/avro-1.5.4.jar >>>> --REGISTER /tmp/avro-1.4.1.jar; >>>> >>>> REGISTER /tmp/piggybank-0.9.1.jar; >>>> REGISTER /tmp/json-simple-1.1.jar; >>>> REGISTER /tmp/jackson-core-asl-1.8.4.jar; >>>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar; >>>> >>>> avroData=LOAD '$DATA_INPUTDIR' USING >>>> org.apache.pig.piggybank.storage.avro.AvroStorage(); >>>> >>>> dataSubset = FOREACH avroData GENERATE myField1, myField2; >>>> describe dataSubset; >>>> ----------------------------------------------- >>>> -- shows: >>>> -- dataSubset : {myField1: int,myField2: int} >>>> ----------------------------------------------- >>>> STORE dataSubset INTO '$OUTPUTDIR' USING >>>> org.apache.pig.piggybank.storage.avro.AvroStorage(); >>>> >>>> If I use the 1.5.4 jar I get the same error, but the script works with the >>>> 1.4.1 version. If I just write one field, then it works with 1.6.0. >>>> >>>> I see there's been a related issue fixed here: >>>> >>>> https://issues.apache.org/jira/browse/PIG-2202 >>>> https://issues.apache.org/jira/browse/PIG-2195 >>>> >>>> Can anyone confirm that this or similar works with avro 1.6.0, and/or >>>> point me in the right direction concering where the problem may lie? >>>> >>>> Many thanks, >>>> >>>> Andrew >>> >>> >>> > > >