This seems to be a problem with apache pig upto 0.12.1
I tried the script with pig 0.13 and it does not throw any errors
The script evaluated:
a = load 'test1.txt' using PigStorage(',') as
(A:chararray,B:chararray,C:chararray);
b = group a by (A,B);
c = foreach b {
asdf = filter $1 by 1==1;
generate COUNT_STAR($1) as TARGET:int;
};
DESCRIBE c;
store c into 'output' USING PigStorage(',');
d = limit c 10;
describe d;
e = foreach d generate TARGET;
DESCRIBE e;
store e into 'output-e' USING PigStorage(','); -- store command
necessary for explain to work
and instead of running the entire program, I just tried to do an "explain"
of the script .
The explain command :
pig -e 'explain -script testpigschema.pig'
EXPLAIN fails in pig upto 0.12.1
But goes through for 0.13.0
Unfortunately, this does not really solve the problem for you except
hinting that this may be a bug in apache pig ?
Regards,
Debabrata Pani
On Mon, Nov 16, 2015 at 9:24 AM, Arvind S <[email protected]> wrote:
> does not seem to be a issue in pig 0.15 .. (tested in local mode only as of
> now)
>
> a = load '/tmp/test/test.txt' using PigStorage(',') as
> (A:chararray,B:chararray,C:chararray);
> b = group a by (A,B);
> c = foreach b {
> asdf = filter $1 by (1==1);
> generate COUNT_STAR($1) as TARGET;
> };
> d = limit c 10;
> e = foreach d generate TARGET;
> dump e;
>
> end output ...
> (1)
>
>
> *Cheers !!*
> Arvind
>
> On Sat, Nov 14, 2015 at 12:18 AM, Christopher Maier <
> [email protected]> wrote:
>
> > Hi,
> >
> > I haven't received a response on this, has anyone had a chance to
> > reproduce the error?
> >
> > Thanks,
> > Kit
> >
> > From: Christopher Maier
> > Sent: Tuesday, October 20, 2015 4:02 PM
> > To: '[email protected]' <[email protected]>
> > Subject: Schema changes based on subquery
> >
> > Hi,
> >
> > I am getting the wrong counts from Pig for a certain query. I have
> > simplified the query to what's below, which shows as a failure instead
> of a
> > wrong count.
> >
> > Why does the first line of the subquery cause the output schema to revert
> > to be the same as the input schema? This line should not have any impact
> on
> > the output.
> >
> > (I've removed some of the extra logging output.)
> >
> > pig -version
> > Apache Pig version 0.12.0 (rexported)
> > compiled Oct 26 2014, 23:43:04
> >
> > Query
> > grunt> a = load 'test1.txt' using PigStorage(',') as
> > (A:chararray,B:chararray,C:chararray);
> > grunt> b = group a by (A,B);
> > grunt> c = foreach b {
> > >> asdf = filter $1 by (1==1);
> > >> generate COUNT_STAR($1) as TARGET;
> > >> };
> > grunt> d = limit c 10;
> >
> > Values
> > grunt> dump a;
> > (a,b,c)
> > grunt> dump b;
> > ((a,b),{(a,b,c)})
> > grunt> dump c;
> > (1)
> > grunt> dump d;
> > (1)
> >
> > Schema 'describe' at each step looks good
> > grunt> describe a;
> > a: {A: chararray,B: chararray,C: chararray}
> > grunt> describe b;
> > b: {group: (A: chararray,B: chararray),a: {(A: chararray,B: chararray,C:
> > chararray)}}
> > grunt> describe c;
> > c: {TARGET: long}
> > grunt> describe d;
> > d: {TARGET: long}
> >
> > Attempted next step fails
> > grunt> e = foreach d generate TARGET;
> > <line 8, column 23> Invalid field projection. Projected field [TARGET]
> > does not exist in schema: A:chararray,B:chararray,C:chararray.
> >
> > Progress of real schema through query
> > grunt> z = foreach a generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema: A:chararray,B:chararray,C:chararray.
> > grunt> z = foreach b generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema:
> >
> group:tuple(A:chararray,B:chararray),a:bag{:tuple(A:chararray,B:chararray,C:chararray)}.
> > grunt> z = foreach c generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema: TARGET:long.
> > grunt> z = foreach d generate FAKE;
> > <line 8, column 23> Invalid field projection. Projected field [FAKE] does
> > not exist in schema: A:chararray,B:chararray,C:chararray.
> >
> > Alternate query shows no error
> > grunt> c = foreach b {
> > >> generate COUNT_STAR($1) as TARGET;
> > >> };
> > grunt> d = limit c 10;
> > grunt> e = foreach d generate TARGET;
> > grunt> dump e;
> > (1)
> >
> > Thanks,
> > Kit Maier
> >
> >
> >
> > Nothing in this message is intended to constitute an electronic signature
> > unless a specific statement to the contrary is included in this message.
> >
> > Confidentiality Note: This message is intended only for the person or
> > entity to which it is addressed. It may contain confidential and/or
> > privileged material. Any review, transmission, dissemination or other
> use,
> > or taking of any action in reliance upon this message by persons or
> > entities other than the intended recipient is prohibited and may be
> > unlawful. If you received this message in error, please contact the
> sender
> > and delete it from your computer.
> >
>