Dear All I am using pig to calculus some data distribution as following:
raw_data = load 'data' using PigStorage(',') as (id: int, tag: chararray); --- get number of record all_records = foreach (group raw_data all) generate COUNT(raw_data) as total; -- calculus the count and proportion of each tag tag_group = group raw_data by tag; tag_histogram = foreach tag_group generate tag, COUNT(raw_data) as *freq*, double(*freq*) / all_records.total as ratio; Here, I refer the previous freq which computes in the same statement, but error with log like this: Invalid field projection. Projected field [freq] does not exist in schema So, does Pig forbid refer field like this? or I must Do like this: tag_histogram = foreach tag_group generate tag, COUNT(raw_data) as *freq*, double(*COUNT(raw_data) *) / all_records.total as ratio; Does Apache Pig will optimize the duplicated COUNT statement? Best, Ryan Xu