including this last message to pig user list
On Sun, Nov 17, 2013 at 7:40 AM, Ruslan Al-Fakikh <metarus...@gmail.com>wrote: > Russel, > > Actually this problem came from the situation when I had the same names in > pig relation schema and avro schema. And it turned out that AvroStorage > switches fields if the order is different. > So, my impression is that it should work this way: > 1) names correspond - then AvroStorage uses them > 2) names do not correspond - then AvroStorage fails to store or does some > schema resolution as shown here: > http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution > > Thanks > > > On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney > <russell.jur...@gmail.com>wrote: > >> How can pig map from a to nonsence_name? >> >> >> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote: >> >>> Thanks, Russel! >>> >>> Do you mean that this is the expected behavior? Shouldn't AvroStorage >>> map the pig fields by their names (not their field order) matching them to >>> the names in the avro schema? >>> >>> Thanks, >>> Ruslan Al-Fakikh >>> >>> >>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney < >>> russell.jur...@gmail.com> wrote: >>> >>>> Pig tuples have field order. Swap the order of the fields in your avro >>>> schema and try again. >>>> >>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <metarus...@gmail.com> >>>> wrote: >>>> >>>> Hey guys, >>>> >>>> When I store with AvroStorage, the names from Pig tuple fields are >>>> completely ignored. The field values are put to the result file only by >>>> their position. >>>> Here is a simplified test case: >>>> >>>> %declare WORKDIR `pwd` >>>> REGISTER ../../../../lib/external/avro-1.7.4.jar >>>> REGISTER ../../../../lib/external/json-simple-1.1.jar >>>> --this is build (manually with Maven) from the latest source: >>>> -- >>>> http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/ >>>> REGISTER ../piggybankBuiltFromSource.jar >>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar >>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar >>>> >>>> --$ cat input.txt >>>> --data_a data_b >>>> --data_a data_b >>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray); >>>> >>>> DESCRIBE inputs; >>>> DUMP inputs; >>>> >>>> --output: >>>> --inputs: {a: chararray,b: chararray} >>>> --(data_a,data_b) >>>> --(data_a,data_b) >>>> >>>> STORE inputs INTO 'output' >>>> USING org.apache.pig.piggybank.storage.avro.AvroStorage('{ >>>> "schema": >>>> { >>>> "type" : "record", >>>> "name" : "my_schema", >>>> "namespace" : "com.my_namespace", >>>> "fields" : [ >>>> { >>>> "name" : "b", >>>> "type" : "string" >>>> }, >>>> { >>>> "name" : "nonsense_name", >>>> "type" : "string" >>>> } >>>> ] >>>> } >>>> }'); >>>> >>>> --output >>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson >>>> output/part* >>>> --{"b":"data_a","nonsense_name":"data_b"} >>>> --{"b":"data_a","nonsense_name":"data_b"} >>>> >>>> AvroStorage is build from the latest piggybank code. >>>> Using AvroStorage "debug": 5 parameter didn't help. >>>> >>>> $ pig -version >>>> Apache Pig version 0.11.0-cdh4.3.0 (rexported) >>>> compiled May 27 2013, 20:48:21 >>>> >>>> Any help would be appreciated. >>>> >>>> Thanks, >>>> Ruslan Al-Fakikh >>>> >>>> >>> >> >> -- >> Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome. >> com >> > >