Hi, I am getting the wrong counts from Pig for a certain query. I have simplified the query to what's below, which shows as a failure instead of a wrong count.
Why does the first line of the subquery cause the output schema to revert to be the same as the input schema? This line should not have any impact on the output. (I've removed some of the extra logging output.) pig -version Apache Pig version 0.12.0 (rexported) compiled Oct 26 2014, 23:43:04 Query grunt> a = load 'test1.txt' using PigStorage(',') as (A:chararray,B:chararray,C:chararray); grunt> b = group a by (A,B); grunt> c = foreach b { >> asdf = filter $1 by (1==1); >> generate COUNT_STAR($1) as TARGET; >> }; grunt> d = limit c 10; Values grunt> dump a; (a,b,c) grunt> dump b; ((a,b),{(a,b,c)}) grunt> dump c; (1) grunt> dump d; (1) Schema 'describe' at each step looks good grunt> describe a; a: {A: chararray,B: chararray,C: chararray} grunt> describe b; b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C: chararray)}} grunt> describe c; c: {TARGET: long} grunt> describe d; d: {TARGET: long} Attempted next step fails grunt> e = foreach d generate TARGET; <line 8, column 23> Invalid field projection. Projected field [TARGET] does not exist in schema: A:chararray,B:chararray,C:chararray. Progress of real schema through query grunt> z = foreach a generate FAKE; <line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: A:chararray,B:chararray,C:chararray. grunt> z = foreach b generate FAKE; <line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}. grunt> z = foreach c generate FAKE; <line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: TARGET:long. grunt> z = foreach d generate FAKE; <line 8, column 23> Invalid field projection. Projected field [FAKE] does not exist in schema: A:chararray,B:chararray,C:chararray. Alternate query shows no error grunt> c = foreach b { >> generate COUNT_STAR($1) as TARGET; >> }; grunt> d = limit c 10; grunt> e = foreach d generate TARGET; grunt> dump e; (1) Thanks, Kit Maier Nothing in this message is intended to constitute an electronic signature unless a specific statement to the contrary is included in this message. Confidentiality Note: This message is intended only for the person or entity to which it is addressed. It may contain confidential and/or privileged material. Any review, transmission, dissemination or other use, or taking of any action in reliance upon this message by persons or entities other than the intended recipient is prohibited and may be unlawful. If you received this message in error, please contact the sender and delete it from your computer.