Hi, I loading data into pig, using LOAD without specifying datatypes. In the second step I call UDF and using AS () I set the proper data types. Typed set looks as below.
grunt> describe sensitiveSet; sensitiveSet: {rank_ID: long,name: chararray,customerId: long,VIN: chararray,birth_date: chararray,fuel_mileage: int,fuel_consumption: float} When I want to store data typed as above, using AvroStorage, Im getting really strange error Datum "Name" is not in union ["null","string"]. When I change the type inside the schema to bytes, all works fine. STORE sensitiveSet INTO 'OutputFileGen1aa' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 'schema', '{"type":"record","name":"test","namespace":"","fields":[ {"name":"rank_ID","type":"long"}, {"name":"name","type":["null","*string*"],"store":"no","sensitive":"na"}, {"name":"cid","type":["null","bytes"],"store":"yes","sensitive":"yes"}, {"name":"VIN","type":["null","bytes"],"store":"yes","sensitive":"yes"}, {"name":"birth_date","type":["null","bytes"],"store":"yes","sensitive":"no"}, {"name":"fuel_mileage","type":["null","bytes"],"store":"yes","sensitive":"no"}, {"name":"fuel_consumption","type":["null","bytes"],"store":"yes","sensitive":"no"} ]}'); error below: 2016-01-11 15:16:15,644 [Thread-28] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2016-01-11 15:16:15,647 [Thread-28] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local2100282506_0010 java.lang.Exception: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum "Name" is not in union ["null","string"] at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.RuntimeException: Datum "Name" is not in union ["null","string"] at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:263) at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49) at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:808) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:281) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:274) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Datum "Name" is not in union ["null","string"] at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.resolveUnionSchema(PigAvroDatumWriter.java:128) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeUnion(PigAvroDatumWriter.java:111) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:82) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.writeRecord(PigAvroDatumWriter.java:365) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66) at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99) at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58) at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:257) ... 20 more 2016-01-11 15:16:15,792 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local2100282506_0010 When I change string to bytes it works properly. What should be the problem? STORE sensitiveSet INTO 'OutputFileGen1aa' USING org.apache.pig.piggybank.storage.avro.AvroStorage('no_schema_check', 'schema', '{"type":"record","name":"test","namespace":"","fields":[ {"name":"rank_ID","type":"long"}, {"name":"name","type":["null","*bytes*"],"store":"no","sensitive":"na"}, {"name":"cid","type":["null","bytes"],"store":"yes","sensitive":"yes"}, {"name":"VIN","type":["null","bytes"],"store":"yes","sensitive":"yes"}, {"name":"birth_date","type":["null","bytes"],"store":"yes","sensitive":"no"}, {"name":"fuel_mileage","type":["null","bytes"],"store":"yes","sensitive":"no"}, {"name":"fuel_consumption","type":["null","bytes"],"store":"yes","sensitive":"no"} ]}'); Thanks