Unfortunately avro storage is not flexible enough as it could be. I do keep avro schemas separatly on hdfs and use pointer to file in AvroStorage for storing. I do always do explicit projection of relation fields before storing relation.
The same problem is for reading data. pushing up fields by names is not working also. 2013/11/17 Russell Jurney <[email protected]> > I think the expected behavior of AvroStorage is to use the tuple-ordered > fields in the order they exist in the tuple. So to fix your problem, swap > the order of b/nonsense_name. > > Otherwise I can't see a way to map from b to nonsense_name at all. Pig > can't know how to do that without referencing tuple field order. > > On Sat, Nov 16, 2013 at 7:42 PM, Ruslan Al-Fakikh <[email protected] > >wrote: > > > including this last message to pig user list > > > > > > On Sun, Nov 17, 2013 at 7:40 AM, Ruslan Al-Fakikh <[email protected] > >wrote: > > > >> Russel, > >> > >> Actually this problem came from the situation when I had the same names > >> in pig relation schema and avro schema. And it turned out that > AvroStorage > >> switches fields if the order is different. > >> So, my impression is that it should work this way: > >> 1) names correspond - then AvroStorage uses them > >> 2) names do not correspond - then AvroStorage fails to store or does > some > >> schema resolution as shown here: > >> http://avro.apache.org/docs/1.7.5/spec.html#Schema+Resolution > >> > >> Thanks > >> > >> > >> On Sun, Nov 17, 2013 at 7:17 AM, Russell Jurney < > [email protected] > >> > wrote: > >> > >>> How can pig map from a to nonsence_name? > >>> > >>> > >>> On Saturday, November 16, 2013, Ruslan Al-Fakikh wrote: > >>> > >>>> Thanks, Russel! > >>>> > >>>> Do you mean that this is the expected behavior? Shouldn't AvroStorage > >>>> map the pig fields by their names (not their field order) matching > them to > >>>> the names in the avro schema? > >>>> > >>>> Thanks, > >>>> Ruslan Al-Fakikh > >>>> > >>>> > >>>> On Sun, Nov 17, 2013 at 6:53 AM, Russell Jurney < > >>>> [email protected]> wrote: > >>>> > >>>>> Pig tuples have field order. Swap the order of the fields in your > avro > >>>>> schema and try again. > >>>>> > >>>>> On Nov 16, 2013, at 6:19 PM, Ruslan Al-Fakikh <[email protected]> > >>>>> wrote: > >>>>> > >>>>> Hey guys, > >>>>> > >>>>> When I store with AvroStorage, the names from Pig tuple fields are > >>>>> completely ignored. The field values are put to the result file only > by > >>>>> their position. > >>>>> Here is a simplified test case: > >>>>> > >>>>> %declare WORKDIR `pwd` > >>>>> REGISTER ../../../../lib/external/avro-1.7.4.jar > >>>>> REGISTER ../../../../lib/external/json-simple-1.1.jar > >>>>> --this is build (manually with Maven) from the latest source: > >>>>> -- > >>>>> > http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/ > >>>>> REGISTER ../piggybankBuiltFromSource.jar > >>>>> REGISTER ../../../../lib/external/jackson-core-asl-1.8.8.jar > >>>>> REGISTER ../../../../lib/external/jackson-mapper-asl-1.8.8.jar > >>>>> > >>>>> --$ cat input.txt > >>>>> --data_a data_b > >>>>> --data_a data_b > >>>>> inputs = LOAD 'input.txt' AS (a: chararray, b: chararray); > >>>>> > >>>>> DESCRIBE inputs; > >>>>> DUMP inputs; > >>>>> > >>>>> --output: > >>>>> --inputs: {a: chararray,b: chararray} > >>>>> --(data_a,data_b) > >>>>> --(data_a,data_b) > >>>>> > >>>>> STORE inputs INTO 'output' > >>>>> USING org.apache.pig.piggybank.storage.avro.AvroStorage('{ > >>>>> "schema": > >>>>> { > >>>>> "type" : "record", > >>>>> "name" : "my_schema", > >>>>> "namespace" : "com.my_namespace", > >>>>> "fields" : [ > >>>>> { > >>>>> "name" : "b", > >>>>> "type" : "string" > >>>>> }, > >>>>> { > >>>>> "name" : "nonsense_name", > >>>>> "type" : "string" > >>>>> } > >>>>> ] > >>>>> } > >>>>> }'); > >>>>> > >>>>> --output > >>>>> --$ java -jar ../../../../lib/external/avro-tools-1.7.4.jar tojson > >>>>> output/part* > >>>>> --{"b":"data_a","nonsense_name":"data_b"} > >>>>> --{"b":"data_a","nonsense_name":"data_b"} > >>>>> > >>>>> AvroStorage is build from the latest piggybank code. > >>>>> Using AvroStorage "debug": 5 parameter didn't help. > >>>>> > >>>>> $ pig -version > >>>>> Apache Pig version 0.11.0-cdh4.3.0 (rexported) > >>>>> compiled May 27 2013, 20:48:21 > >>>>> > >>>>> Any help would be appreciated. > >>>>> > >>>>> Thanks, > >>>>> Ruslan Al-Fakikh > >>>>> > >>>>> > >>>> > >>> > >>> -- > >>> Russell Jurney twitter.com/rjurney [email protected] > >>> .com > >>> > >> > >> > > > > > -- > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com >
