Hey, Have you checked that you are really getting all the columns you have specified in x? can you tell me what "dump x" is giving you? When you flatten group in x, try doing it like group.page_name as page_name, group.web_session_id as web_session_id then you can do grouped2 = GROUP x by page_name;
On Friday, November 15, 2013 2:17 AM, Mix Nin <[email protected]> wrote: Hi I have a group and foreach statements as below grouped = GROUP filterdata BY (page_name,web_session_id); x = foreach grouped { distinct_web_cookie_id= DISTINCT filterdata.web_cookie_id; distinct_encrypted_customer_id= DISTINCT filterdata.encrypted_customer_id; distinct_web_session_id= DISTINCT filterdata.web_session_id; distinct_event_time = DISTINCT filterdata.event_time; distinct_customer_id = DISTINCT filterdata.customer_id; generate flatten(group), COUNT_STAR(distinct_web_cookie_id) AS distinct_web_cookie_id, COUNT_STAR(distinct_encrypted_customer_id) AS distinct_encrypted_customer_id, COUNT_STAR(distinct_customer_id) AS distinct_customer_id, COUNT_STAR(distinct_web_session_id) AS distinct_web_session_id ,COUNT_STAR(filterdata) AS cnt_events; }; Now I want to group on Session_id in x and get the sum of (cnt_events) and written below commands grouped2 = GROUP x BY page_name; d = foreach grouped2 generate group, COUNT_STAR(cnt_events) tot_events; When I run "grouped2 = GROUP x BY page_name;", I get below error: [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: <line 31, column 23> Invalid field projection. Projected field [page_name] does not exist in schema: event_time:chararray. When I use describe x, I get output as x: {event_time: chararray} Not sure whether schema for foreach statement works? How do I solve this problem. Thanks
