Hi,

I am getting the wrong counts from Pig for a certain query. I have simplified 
the query to what's below, which shows as a failure instead of a wrong count.

Why does the first line of the subquery cause the output schema to revert to be 
the same as the input schema? This line should not have any impact on the 
output.

(I've removed some of the extra logging output.)

pig -version
Apache Pig version 0.12.0 (rexported)
compiled Oct 26 2014, 23:43:04

Query
grunt> a = load 'test1.txt' using PigStorage(',') as 
(A:chararray,B:chararray,C:chararray);
grunt> b = group a by (A,B);
grunt> c = foreach b {
>>     asdf = filter $1 by (1==1);
>>     generate COUNT_STAR($1) as TARGET;
>> };
grunt> d = limit c 10;

Values
grunt> dump a;
(a,b,c)
grunt> dump b;
((a,b),{(a,b,c)})
grunt> dump c;
(1)
grunt> dump d;
(1)

Schema 'describe' at each step looks good
grunt> describe a;
a: {A: chararray,B: chararray,C: chararray}
grunt> describe b;
b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C: 
chararray)}}
grunt> describe c;
c: {TARGET: long}
grunt> describe d;
d: {TARGET: long}

Attempted next step fails
grunt> e = foreach d generate TARGET;
<line 8, column 23> Invalid field projection. Projected field [TARGET] does not 
exist in schema: A:chararray,B:chararray,C:chararray.

Progress of real schema through query
grunt> z = foreach a generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not 
exist in schema: A:chararray,B:chararray,C:chararray.
grunt> z = foreach b generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not 
exist in schema: 
group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
grunt> z = foreach c generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not 
exist in schema: TARGET:long.
grunt> z = foreach d generate FAKE;
<line 8, column 23> Invalid field projection. Projected field [FAKE] does not 
exist in schema: A:chararray,B:chararray,C:chararray.

Alternate query shows no error
grunt> c = foreach b {
>> generate COUNT_STAR($1) as TARGET;
>> };
grunt> d = limit c 10;
grunt> e = foreach d generate TARGET;
grunt> dump e;
(1)

Thanks,
Kit Maier



Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.

Reply via email to