Actually, I found it in Pig manual:

 If you need to use different constructor parameters for different calls to
> the function you will need to create multiple defines – one for each
> parameter set.


For example, this works:

DEFINE AvroStorageNoParam
> org.apache.pig.piggybank.storage.avro.AvroStorage();
> DEFINE AvroStorageWithParam
> org.apache.pig.piggybank.storage.avro.AvroStorage('schema', '{"type" :
> "map","values" : "string"}');
> loaded_data = LOAD 'map.avro' USING *AvroStorageNoParam*;
> describe loaded_data;
> STORE loaded_data INTO 'output' USING *AvroStorageWithParam*;


Please see the usage section:
http://pig.apache.org/docs/r0.10.0/basic.html#define-udfs

Thanks,
Cheolsoo

On Thu, Aug 23, 2012 at 11:11 AM, Cheolsoo Park <cheol...@cloudera.com>wrote:

> Hi Johannes,
>
> I was able to reproduce your error with the following Avro schema:
>
> {
>>   "type" : "map",
>>   "values" : "string"
>> }
>
>
> The issue is not in AvroStorage but in the DEFINE statement.
>
> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>
>
> AvroStorage has two constructors: one with no parameter and the other with
> parameters. To define output Avro schema, the second one must be used. But
> your DEFINE statement makes the first constructor be used always, resulting
> that output Avro schema is not set. If you remove the DEFINE statement and
> use the fully qualified name of AvroStorage, everything works. For example,
>
> loaded_data = LOAD 'map.avro' USING *
>> org.apache.pig.piggybank.storage.avro.AvroStorage.AvroStorage*();
>> describe loaded_data;
>> STORE loaded_data INTO 'output' USING *
>> org.apache.pig.piggybank.storage.avro.AvroStorage*('schema', '
>> {
>>   "type" : "map",
>>   "values" : "string"
>> }
>> ');
>
>
> Now the question is why DEFINE does not work here.
>
> Thanks,
> Cheolsoo
>
>
> On Thu, Aug 23, 2012 at 8:49 AM, Johannes Schwenk <
> johannes.schw...@adition.com> wrote:
>
>> Hi all,
>>
>> I'm trying to execute the following pig script with pig-0.10.0 and yarn
>> (cdh4.0.0):
>>
>> --
>> DEFINE AvroStorage org.apache.pig.piggybank.storage.avro.AvroStorage();
>> loaded_data = LOAD '$input' USING AvroStorage();
>> STORE loaded_data INTO '$output' USING AvroStorage('same', '$input');
>> --
>>
>> I call the pig this way:
>>
>> pig
>>
>> -Dpig.additional.jars=lib/piggybank.jar:lib/json-simple-1.1.jar:lib/avro-1.5.3.jar
>> -file script.pig -param input=input.avro -param output=output.avro
>>
>> The input.avro has the following schema:
>>
>> http://pastebin.com/ZWU6qLWx
>>
>> I always get
>>
>> <file script.pig, line 3, column 0> Output Location Validation Failed
>> for: 'xxx/output.avro' More info to follow:
>> Please provide schema for Map field!
>> Details at logfile: xxx/pig_1345735999390.log
>>
>> Log excerpt:
>>
>> Please provide schema for Map field!
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)
>>         at
>> org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:77)
>>         at
>>
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>>         at
>>
>> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)
>>         at
>> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>>         at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50)
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)
>>         at
>>
>> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:293)
>>         at org.apache.pig.PigServer.compilePp(PigServer.java:1316)
>>         at
>> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1253)
>>         at org.apache.pig.PigServer.execute(PigServer.java:1245)
>>         at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
>>         at
>> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
>>         at
>>
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
>>         at
>>
>> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>>         at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>>         at org.apache.pig.Main.run(Main.java:430)
>>         at org.apache.pig.Main.main(Main.java:111)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> Caused by: java.io.IOException: Please provide schema for Map field!
>>         at
>>
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:110)
>>         at
>>
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convertRecord(PigSchema2Avro.java:151)
>>         at
>>
>> org.apache.pig.piggybank.storage.avro.PigSchema2Avro.convert(PigSchema2Avro.java:62)
>>         at
>>
>> org.apache.pig.piggybank.storage.avro.AvroStorage.checkSchema(AvroStorage.java:534)
>>         at
>>
>> org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:65)
>>         ... 22 more
>>
>>
>> I also tried to specify
>>
>> AvroStorage('{"debug": 5, "schema_file": "schema.avsc", "field22",
>> "def:pd", "field23", "def:epd"}')
>>
>> - same result.
>>
>>
>> Do you have any hints?
>>
>> Greetings,
>> Johannes Schwenk
>>
>> --
>> Softwareentwickler (Reporting)
>> ________________________________________________________
>>
>> ADITION technologies AG
>> Schwarzwaldstraße 78b
>> 79117 Freiburg
>>
>> http://www.adition.com
>>
>> T +49 / (0)761 / 88147 - 30
>> F +49 / (0)761 / 88147 - 77
>> SUPPORT +49  / (0)1805 - ADITION
>>
>> (Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)
>>
>> Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
>> Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus
>> Schlüter
>> Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
>> UStIDNr.: DE 218 858 434
>>
>>
>

Reply via email to