Dear All

I am using pig to calculus some data distribution as following:

raw_data = load 'data' using PigStorage(',') as (id: int, tag: chararray);

--- get number of record
all_records = foreach (group raw_data all) generate COUNT(raw_data) as
total;

-- calculus the  count and proportion of each tag
tag_group = group raw_data by tag;
tag_histogram = foreach tag_group generate tag, COUNT(raw_data) as *freq*,
double(*freq*) / all_records.total as ratio;

Here, I refer the previous  freq which computes in the same statement, but
error with log like this:

    Invalid field projection. Projected field [freq] does not exist in
schema

So, does Pig forbid refer field like this? or I must Do like this:
tag_histogram = foreach tag_group generate tag, COUNT(raw_data) as *freq*,
double(*COUNT(raw_data) *) / all_records.total as ratio;

Does Apache Pig will optimize the duplicated COUNT  statement?

Best,
Ryan Xu

Reply via email to