Actually, I found it in Pig manual: If you need to use different constructor parameters for different calls to > the function you will need to create multiple defines – one for each > parameter set.
For example, this works: DEFINE AvroStorageNoParam > org.apache.pig.piggybank.storage.avro.AvroStorage(); > DEFINE AvroStorageWithParam > org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"type" : > "map","values" : "string"}'); > loaded_data = LOAD 'map.avro' USING *AvroStorageNoParam*; > describe loaded_data; > STORE loaded_data INTO 'output' USING *AvroStorageWithParam*; Please see the usage section: http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs Thanks, Cheolsoo On Thu, Aug 23, 2012 at 11:11 AM, Cheolsoo Park <cheol...@cloudera.com>wrote: > Hi Johannes, > > I was able to reproduce your error with the following Avro schema: > > { >> "type" : "map", >> "values" : "string" >> } > > > The issue is not in AvroStorage but in the DEFINE statement. > > DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); > > > AvroStorage has two constructors: one with no parameter and the other with > parameters. To define output Avro schema, the second one must be used. But > your DEFINE statement makes the first constructor be used always, resulting > that output Avro schema is not set. If you remove the DEFINE statement and > use the fully qualified name of AvroStorage, everything works. For example, > > loaded_data = LOAD 'map.avro' USING * >> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*(); >> describe loaded_data; >> STORE loaded_data INTO 'output' USING * >> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', ' >> { >> "type" : "map", >> "values" : "string" >> } >> '); > > > Now the question is why DEFINE does not work here. > > Thanks, > Cheolsoo > > > On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk < > johannes.schw...@adition.com> wrote: > >> Hi all, >> >> I'm trying to execute the following pig script with pig-0.10.0 and yarn >> (cdh4.0.0): >> >> -- >> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage(); >> loaded_data = LOAD '$input' USING AvroStorage(); >> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input'); >> -- >> >> I call the pig this way: >> >> pig >> >> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar >> -file script.pig -param input=input.avro -param output=output.avro >> >> The input.avro has the following schema: >> >> http://pastebin.com/ZWU6qLWx >> >> I always get >> >> <file script.pig, line 3, column 0> Output Location Validation Failed >> for: 'xxx/output.avro' More info to follow: >> Please provide schema for Map field! >> Details at logfile: xxx/pig_1345735999390.log >> >> Log excerpt: >> >> Please provide schema for Map field! >> at >> >> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75) >> at >> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77) >> at >> >> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64) >> at >> >> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66) >> at >> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53) >> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) >> at >> >> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45) >> at >> >> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293) >> at org.apache.pig.PigServer.compilePp(PigServer.java:1316) >> at >> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253) >> at org.apache.pig.PigServer.execute(PigServer.java:1245) >> at org.apache.pig.PigServer.executeBatch(PigServer.java:362) >> at >> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132) >> at >> >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193) >> at >> >> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) >> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) >> at org.apache.pig.Main.run(Main.java:430) >> at org.apache.pig.Main.main(Main.java:111) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:208) >> Caused by: java.io.IOException: Please provide schema for Map field! >> at >> >> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:110) >> at >> >> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convertRecord(PigSchema2Avro.java:151) >> at >> >> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:62) >> at >> >> org.apache.pig.piggybank.storage.avro.AvroStorage.checkSchema(AvroStorage.java:534) >> at >> >> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65) >> ... 22 more >> >> >> I also tried to specify >> >> AvroStorage('{"debug": 5, "schema_file": "schema.avsc", "field22", >> "def:pd", "field23", "def:epd"}') >> >> - same result. >> >> >> Do you have any hints? >> >> Greetings, >> Johannes Schwenk >> >> -- >> Softwareentwickler (Reporting) >> ________________________________________________________ >> >> ADITION technologies AG >> Schwarzwaldstraße 78b >> 79117 Freiburg >> >> http://www.adition.com >> >> T +49 / (0)761 / 88147 - 30 >> F +49 / (0)761 / 88147 - 77 >> SUPPORT +49 / (0)1805 - ADITION >> >> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min) >> >> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076 >> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus >> Schlüter >> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer >> UStIDNr.: DE 218 858 434 >> >> >