Hello,

I'm having an issue with a script that uses an EvalFunc I wrote. The issue
is the final output contains characters that I am not expecting (commas -
followed by what I'm guessing are null fields which I do not see).

Snippet:
C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int);
grunt> DUMP C;
(2,3)
(2,4)
(2,5)
(3,4)
(3,5)
(4,5)
(2,3)
(2,4)
(2,5)
(3,4)
(3,5)
(4,5)

D = GROUP C by (f1,f2);
grunt> describe D;
D: {group: (f1: int,f2: int),C: {f1: int,f2: int}}

grunt> DUMP D;
((2,3,),{(2,3,),(2,3,)})
((2,4,),{(2,4,),(2,4,)})
((2,5,),{(2,5,),(2,5,)})
((3,4,),{(3,4,),(3,4,)})
((3,5,),{(3,5,),(3,5,)})
((4,5,),{(4,5,),(4,5,)})

My question is, what are these extra comma/null fiends in each tuple? I
expected the first row to read as:
((2,3),{(2,3),(2,3)})

It seems related, but when I run 'ILLUSTRATE C', I get an exeption:
java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
at org.apache.pig.pen.util.ExampleTuple.get(ExampleTuple.java:80)
at
org.apache.pig.pen.util.DisplayExamples.MakeArray(DisplayExamples.java:190)
at
org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:86)
at
org.apache.pig.pen.util.DisplayExamples.printTabular(DisplayExamples.java:69)
at
org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:143)
at org.apache.pig.PigServer.getExamples(PigServer.java:785)
at
org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:555)
at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:246)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
at org.apache.pig.Main.main(Main.java:357)

Excruciating detail below:

My script:
REGISTER udf.jar
A = LOAD '/pig_input/co.txt' as (line:chararray);
B = FOREACH A GENERATE com.thumbplay.pig.NormalizeListUDF(line) as B;
C = FOREACH B GENERATE FLATTEN(B) as (f1:int,f2:int);
D = GROUP C by (f1,f2);
E = FOREACH D GENERATE group, COUNT(C);
STORE E INTO 'output' USING PigStorage(',');

Here's what I'm trying to do:
For input:
A,1,2,3
B,1,2,3

Produce combinations for each row (My UDF does this):
(1,2),(1,3),(2,3)
(1,2),(1,3),(2,3)

Flatten them:
(1,2),
(1,3),
(2,3),
(1,2),
(1,3),
(2,3)

Group and count them:
(1,2),2
(1,3),2
(2,3),2

Reply via email to