Hi Stan,

Thank you for your feedback. I've run the script passing "-D 
mapred.child.java.opts=-verbose:class" and have the following in my logs:

[Loaded org.apache.avro.generic.GenericDatumWriter from 
file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar]
[Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from 
file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar]

I assume the .../job_201111230039_0146/jars/job.jar is the one prepared by pig 
using the jars I have REGISTER-ed, in which case the classes are the ones I 
expect, or have I misread that?

Regards,

Andrew



>________________________________
> From: Stan Rosenberg <srosenb...@proclivitysystems.com>
>To: user@pig.apache.org; Andrew Kenworthy <adwkenwor...@yahoo.com> 
>Sent: Tuesday, January 10, 2012 5:36 PM
>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
> 
>Andrew,
>
>Something looks odd in this stack trace:
>
>Caused by: java.lang.ClassCastException:
>org.apache.pig.data.BinSedesTuple cannot be cast to
>org.apache.avro.generic.IndexedRecord
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>>         at 
>>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
>>         at 
>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>>         at 
>>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
>
>PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order
>to extract values from a tuple.  Thus, I would expect that the third
>method invocation be PigAvroDatumWriter.writeRecord.  Perhaps, someone
>else has more insight as to why it's not getting invoked.  In the
>meantime, please confirm that both PigAvroDatumWriter and
>GenericDatumWriter are loaded from the right jar files. (You can do
>this by temporarily changing the pig script to invoke JVM with 'java
>-verbose' and 'grep' the output for these classes.)
>
>Best,
>
>stan
>
>On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy
><adwkenwor...@yahoo.com> wrote:
>> Hi Stan,
>>
>> here's the full stacktrace:
>>
>> org.apache.avro.file.DataFileWriter$AppendWriteException: 
>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be 
>> cast to org.apache.avro.generic.IndexedRecord
>>         at 
>> org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
>>         at 
>> org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
>>         at 
>> org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580)
>>         at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
>>         at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
>>         at 
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530)
>>         at 
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>         at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>         at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>>         at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>         at 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple 
>> cannot be cast to org.apache.avro.generic.IndexedRecord
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>>         at 
>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
>>         at 
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>>         at 
>> org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
>>         at 
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
>>         at 
>> org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255)
>>         ... 18 more
>>
>>
>> Andrew
>>
>>
>>
>>>________________________________
>>> From: Stan Rosenberg <srosenb...@proclivitysystems.com>
>>>To: user@pig.apache.org; Andrew Kenworthy <adwkenwor...@yahoo.com>
>>>Sent: Monday, January 9, 2012 5:30 PM
>>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
>>>
>>>Andrew,
>>>
>>>The source of the problem may be AvroStorage in piggybank.  Could you
>>>please include the entire stack trace?
>>>
>>>stan
>>>
>>>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <adwkenwor...@yahoo.com> 
>>>wrote:
>>>> Hallo,
>>>>
>>>> When I run a simple pig script to LOAD and STORE avro data, I get:-
>>>>
>>>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be 
>>>> cast to org.apache.avro.generic.IndexedRecord
>>>>
>>>>
>>>> Script:
>>>>
>>>> REGISTER /tmp/avro-1.6.0.jar;
>>>> --REGISTER /tmp/avro-1.5.4.jar
>>>> --REGISTER /tmp/avro-1.4.1.jar;
>>>>
>>>> REGISTER /tmp/piggybank-0.9.1.jar;
>>>> REGISTER /tmp/json-simple-1.1.jar;
>>>> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
>>>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>>>>
>>>> avroData=LOAD '$DATA_INPUTDIR' USING 
>>>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>>
>>>> dataSubset = FOREACH avroData GENERATE myField1, myField2;
>>>> describe dataSubset;
>>>> -----------------------------------------------
>>>> -- shows:
>>>> -- dataSubset : {myField1: int,myField2: int}
>>>> -----------------------------------------------
>>>> STORE dataSubset INTO '$OUTPUTDIR' USING 
>>>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>>
>>>> If I use the 1.5.4 jar I get the same error, but the script works with the 
>>>> 1.4.1 version. If I just write one field, then it works with 1.6.0.
>>>>
>>>> I see there's been a related issue fixed here:
>>>>
>>>> https://issues.apache.org/jira/browse/PIG-2202
>>>> https://issues.apache.org/jira/browse/PIG-2195
>>>>
>>>> Can anyone confirm that this or similar works with avro 1.6.0, and/or 
>>>> point me in the right direction concering where the problem may lie?
>>>>
>>>> Many thanks,
>>>>
>>>> Andrew
>>>
>>>
>>>
>
>
>

Reply via email to