Hello, I'm having an issue with a script that uses an EvalFunc I wrote. The issue is the final output contains characters that I am not expecting (commas - followed by what I'm guessing are null fields which I do not see).
Snippet: C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int); grunt> DUMP C; (2,3) (2,4) (2,5) (3,4) (3,5) (4,5) (2,3) (2,4) (2,5) (3,4) (3,5) (4,5) D = GROUP C by (f1,f2); grunt> describe D; D: {group: (f1: int,f2: int),C: {f1: int,f2: int}} grunt> DUMP D; ((2,3,),{(2,3,),(2,3,)}) ((2,4,),{(2,4,),(2,4,)}) ((2,5,),{(2,5,),(2,5,)}) ((3,4,),{(3,4,),(3,4,)}) ((3,5,),{(3,5,),(3,5,)}) ((4,5,),{(4,5,),(4,5,)}) My question is, what are these extra comma/null fiends in each tuple? I expected the first row to read as: ((2,3),{(2,3),(2,3)}) It seems related, but when I run 'ILLUSTRATE C', I get an exeption: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80) at org.apache.pig.pen.util.DisplayExamples.MakeArray(DisplayExamples.java:190) at org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:86) at org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:69) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:143) at org.apache.pig.PigServer.getExamples(PigServer.java:785) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:555) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:357) Excruciating detail below: My script: REGISTER udf.jar A = LOAD '/pig_input/co.txt' as (line:chararray); B = FOREACH A GENERATE com.thumbplay.pig.NormalizeListUDF(line) as B; C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int); D = GROUP C by (f1,f2); E = FOREACH D GENERATE group, COUNT(C); STORE E INTO 'output' USING PigStorage(','); Here's what I'm trying to do: For input: A,1,2,3 B,1,2,3 Produce combinations for each row (My UDF does this): (1,2),(1,3),(2,3) (1,2),(1,3),(2,3) Flatten them: (1,2), (1,3), (2,3), (1,2), (1,3), (2,3) Group and count them: (1,2),2 (1,3),2 (2,3),2