>From what I see the data type of the DataBag is not correcly recognized.
I guess that that -1 comes from DataType.findType(), that is returning ERROR.
I also assume (but I am not sure) that the concrete type of getValue()
should be AccumulativeBag,
but for some reason it is something different. Maybe the BagFactory is
misbehaving.

Can you post the complete code for your UDF?

Cheers,
--
Gianmarco De Francisci Morales



On Tue, Jan 25, 2011 at 19:24, Jonathan Coveney <jcove...@gmail.com> wrote:
> I've been able to isolate the problem, but have no idea what is causing it.
>
> The input is in this form (this is correct):
>
> {({(a),(b),(c)}),({(a),(b),(c)}),({(a),(b),(c)})}
>
> and the output is in this form:
>
> {(b,c,3),(c,a,3),(b,a,3)}
>
> which is also correct. By placing prints and whatnot, I can see that the
> error is coming once I return the second bag.
>
>        public DataBag exec(Tuple input) throws IOException {
>                try {
>                        accumulate(input);
>                        DataBag bag = getValue();
>                        System.out.println(input.get(0).toString());
>                        System.out.println(bag.toString());
>                        return bag;
>
>                } catch (Exception e) {
>                        int errCode = 31415;
>                        String msg = "Error while accumulating graphs (exec)
> " + this.getClass().getSimpleName();
>                        throw new ExecException(msg, errCode,
> PigException.BUG, e);
>                }
>        }
>
> The prints are how I saw that it calculated properly, and I know it's not an
> error within exec because it's not throwing an exception. So something weird
> is going on afterwards.
>
> It'd be great to understand what is going on, because I think this is what
> was plaguing an algebraic version of another script...
>
> Is there something special you have to do if the form of your output is
> significantly different from the form of your input? Here is the script that
> generates this:
>
> register /path/to/myudf.jar;
> A = LOAD 'test.txt' as (a:chararray, b:chararray);
> B = GROUP A BY a;
> C = FOREACH B GENERATE A.b;
> D = GROUP C ALL;
> E = FOREACH D GENERATE myudf.fun.udf(C.b);
>
> So It's weird: I'm getting the output I want, it is a DataBag, I output
> that, but something is exploding.
>
> Any ideas what it could be? As always, thanks.
>
> Here's from grunt:
>
> java.io.IOException: java.lang.RuntimeException: Unexpected data type -1
> found in stream.
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:438)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:401)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:381)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:251)
>        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>        at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
> Caused by: java.lang.RuntimeException: Unexpected data type -1 found in
> stream.
>        at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:478)
>        at
> org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
>        at
> org.apache.pig.data.BinInterSedes.writeBag(BinInterSedes.java:522)
>        at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:361)
>        at
> org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
>        at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
>        at
> org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:541)
>        at
> org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:357)
>        at
> org.apache.pig.impl.io.InterRecordWriter.write(InterRecordWriter.java:73)
>        at org.apache.pig.impl.io.InterStorage.putNext(InterStorage.java:87)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
>        at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:508)
>        at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>        at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:436)
> Here's from the logfile:Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias E
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias E
>        at org.apache.pig.PigServer.openIterator(PigServer.java:754)
>        at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:612)
>        at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
>        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
>        at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
>        at org.apache.pig.Main.run(Main.java:465)
>        at org.apache.pig.Main.main(Main.java:107)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>        at org.apache.pig.PigServer.openIterator(PigServer.java:744)
>        ... 7 more
> ================================================================================
>

Reply via email to