Hi,
I am getting the wrong counts from Pig for a certain query. I have simplified
the query to what's below, which shows as a failure instead of a wrong count.
Why does the first line of the subquery cause the output schema to revert to be
the same as the input schema? This line should not have any impact on the
output.
(I've removed some of the extra logging output.)
pig -version
Apache Pig version 0.12.0 (rexported)
compiled Oct 26 2014, 23:43:04
Query
grunt> a = load 'test1.txt' using PigStorage(',') as
(A:chararray,B:chararray,C:chararray);
grunt> b = group a by (A,B);
grunt> c = foreach b {
>> asdf = filter $1 by (1==1);
>> generate COUNT_STAR($1) as TARGET;
>> };
grunt> d = limit c 10;
Values
grunt> dump a;
(a,b,c)
grunt> dump b;
((a,b),{(a,b,c)})
grunt> dump c;
(1)
grunt> dump d;
(1)
Schema 'describe' at each step looks good
grunt> describe a;
a: {A: chararray,B: chararray,C: chararray}
grunt> describe b;
b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C:
chararray)}}
grunt> describe c;
c: {TARGET: long}
grunt> describe d;
d: {TARGET: long}
Attempted next step fails
grunt> e = foreach d generate TARGET;
<line 8, column 23> Invalid field projection. Projected field [TARGET] does not
exist in schema: A:chararray,B:chararray,C:chararray.
Progress of real schema through query
grunt> z = foreach a generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not
exist in schema: A:chararray,B:chararray,C:chararray.
grunt> z = foreach b generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not
exist in schema:
group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
grunt> z = foreach c generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not
exist in schema: TARGET:long.
grunt> z = foreach d generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not
exist in schema: A:chararray,B:chararray,C:chararray.
Alternate query shows no error
grunt> c = foreach b {
>> generate COUNT_STAR($1) as TARGET;
>> };
grunt> d = limit c 10;
grunt> e = foreach d generate TARGET;
grunt> dump e;
(1)
Thanks,
Kit Maier
Nothing in this message is intended to constitute an electronic signature
unless a specific statement to the contrary is included in this message.
Confidentiality Note: This message is intended only for the person or entity to
which it is addressed. It may contain confidential and/or privileged material.
Any review, transmission, dissemination or other use, or taking of any action
in reliance upon this message by persons or entities other than the intended
recipient is prohibited and may be unlawful. If you received this message in
error, please contact the sender and delete it from your computer.